2015-09-24
§
|
23:02 |
<krenair@tin> |
Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/240880/ (duration: 00m 17s) |
[production] |
19:57 |
<ori@tin> |
Synchronized php-1.26wmf24/extensions/ContentTranslation: d079d5dd71: Updated mediawiki/core Project: mediawiki/extensions/ContentTranslation 8559ee614975f25b71a732ca0fb1bb6d489c9d33 (duration: 00m 18s) |
[production] |
19:35 |
<bblack> |
depooled cp1046 from confd, committed pybal depool for LVS as well |
[production] |
19:34 |
<chasemp> |
changing labs route on cr1 and cr2 from 10.68.16.0/22 to 10.68.16.0/21 which matches references, fw setting and manifests/network.pp |
[production] |
18:54 |
<catrope@tin> |
Synchronized php-1.26wmf24/extensions/Flow/: Debugging for FlowFixLinks.php (duration: 00m 20s) |
[production] |
18:21 |
<twentyafterfour@tin> |
rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf24 |
[production] |
18:20 |
<legoktm> |
moved oauthadmin group from User:Yuvipanda@metawiki to User:YuviPanda@metawiki |
[production] |
18:19 |
<godog> |
restart restbase on restbase1005 |
[production] |
18:18 |
<godog> |
restart restbase on restbase1004 |
[production] |
18:16 |
<godog> |
restart restbase on restbase1003 |
[production] |
18:06 |
<paravoid> |
depooling cp1046, stability issues |
[production] |
18:00 |
<demon@tin> |
Synchronized multiversion/MWRealm.php: (no message) (duration: 00m 17s) |
[production] |
17:59 |
<ori> |
Merged Apache config change Ia095457fb. It will refresh the Apache service as it rolls out, causing elevated 503s for the next 20 minutes. |
[production] |
17:53 |
<godog> |
rolling restart restbase in eqiad |
[production] |
17:35 |
<chasemp> |
powercycling cp1046 at mgmt as I can't ssh in and it seems like it should be up |
[production] |
17:26 |
<godog> |
bounce restbase on restbase1002, apply new datacenter config |
[production] |
17:10 |
<_joe_> |
cleaning up /tmp on mw1152 |
[production] |
17:09 |
<cmjohnson1> |
powering down for the last time es1001 - es1010 |
[production] |
16:17 |
<thcipriani@tin> |
Synchronized php-1.26wmf23/extensions/Wikidata: SWAT: Do not filter affected pages by namespace [[gerrit:240727]] (duration: 00m 26s) |
[production] |
16:01 |
<robh> |
nothing on puppet swat window, easiest swat ever. |
[production] |
15:46 |
<thcipriani@tin> |
Synchronized php-1.26wmf24/extensions/Wikidata: SWAT: Do not filter affected pages by namespace [[gerrit:240711]] (duration: 00m 26s) |
[production] |
15:23 |
<thcipriani@tin> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suggestions in ca, en, es, fr, it, ja, tr, ru, zh [[gerrit:240638]] (duration: 00m 17s) |
[production] |
14:37 |
<paravoid> |
repooling codfw |
[production] |
12:54 |
<bblack> |
restarting varnish daemons on second half of maps, parsoid, misc clusters (package upgrade, shm_reclen change) |
[production] |
12:50 |
<bblack> |
restarting varnishd instances on text, mobile, upload clusters for package upgrade (slow salt, no parallelism, ~5m spacing - FE cache loss, BE cache stays, should take ~9h) |
[production] |
12:05 |
<moritzm> |
installed rpcbind security updates on eeden, baham, radon, maerlant, rhenium |
[production] |
11:56 |
<bblack> |
restarting varnish daemons on half of maps, parsoid, misc clusters (package upgrade, shm_reclen change) |
[production] |
11:36 |
<bblack> |
reinstall lvs300[12] to jessie - T96375 |
[production] |
11:21 |
<akosiaris> |
killed tail -f varnishncsa.log on cp1065 and ran apt-get clean to reclaim some disk space |
[production] |
11:14 |
<bblack> |
stopping pybal on lvs300[12]; lvs300[34] taking over |
[production] |
11:07 |
<bblack> |
upgrading varnishes to 3.0.6plus-wm8 (non-restarting, just pkg update on-disk) |
[production] |
09:40 |
<jynus> |
performing latest (software) steps to decom es1001-es1010 (puppet disabling, etc.) |
[production] |
08:39 |
<jynus> |
restarted HHVM @ mw1056, 1104, 1122 |
[production] |
05:33 |
<yuvipanda> |
deleted logstash indexes for 08/27 and 28 too |
[production] |
05:31 |
<yuvipanda> |
deleted indexes for 08/14, 15, 25, 26 on logstash |
[production] |
03:59 |
<yuvipanda> |
restarting elasticsearch in logstash1001-3 |
[production] |
03:53 |
<yuvipanda> |
restarted es on logstash1004-6 |
[production] |
03:02 |
<yuvipanda> |
jstack dumped logstash output onto /home/yuvipanda/stack on logstash1001 since strace seems useles |
[production] |
02:51 |
<yuvipanda> |
restarted logstash on logstash1002 |
[production] |
02:41 |
<yuvipanda> |
gmond at 100% again, killing it and stopping puppet again |
[production] |
02:40 |
<yuvipanda> |
re-enabling and running puppet on hafnium to see what it's bringing up |
[production] |
02:38 |
<l10nupdate@tin> |
Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 30s) |
[production] |
02:23 |
<yuvipanda> |
kill gmond on hafnium and disable puppet to prevent it from taking it back up. Was taking 100% CPU |
[production] |
02:16 |
<Krinkle> |
Kibana/Logstash outage. Zero events received after 2015-09-23T23:59:59.999Z. |
[production] |
02:14 |
<Krinkle> |
Partial EventLogging outage (client-side events via hafnium abruptly stopped 2015-09-23 11:36 UTC - 15 hours ago) |
[production] |