2019-06-20
ยง
|
16:16 |
<Krinkle> |
scb1001 is producing 120,000 errors per minute as of 16:09 UTC minute ago (under 500/min before that) |
[production] |
15:40 |
<Krinkle> |
krinkle@deploy1001: pull down 98399b1032a0 to wmf.10 (test-only change) |
[production] |
15:05 |
<jijiki> |
Rolling restart php-fpm on jobrunners to pick up new opcache settings - 518023 |
[production] |
15:03 |
<jijiki> |
Repool mw1311 |
[production] |
15:01 |
<jeh> |
T101631 updating replica views on labsdb1009 |
[production] |
14:58 |
<akosiaris> |
make sure all kubernetes hosts (except kubernetes2001 which is used to investigate some outgoing packet discards) are pooled and with the exact same weight |
[production] |
14:57 |
<jijiki> |
enable puppet on jobrunners |
[production] |
14:57 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=kubernetes1005.* |
[production] |
14:57 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=kubernetes1006.* |
[production] |
14:56 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=kubernetes2006.* |
[production] |
14:56 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=kubernetes2005.* |
[production] |
14:54 |
<jeh> |
T101631 updating replica views on labsdb1010 |
[production] |
14:47 |
<jeh> |
T101631 updating replica views on labsdb1011 |
[production] |
14:41 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet |
[production] |
14:36 |
<jeh> |
T101631 updating replica views on labsdb1012 |
[production] |
14:28 |
<Amir1> |
end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052) |
[production] |
14:22 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:517669|Set EntityUsageTable addUsage batch size to 150 (T225500)]] (duration: 00m 56s) |
[production] |
14:18 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
14:18 |
<Amir1> |
start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052) |
[production] |
14:16 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet |
[production] |
14:16 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=kubernetes2001.* |
[production] |
14:14 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:518028|Switch property terms migration to WRITE_BOTH on test wikidata (T225051)]] (duration: 00m 56s) |
[production] |
14:14 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet |
[production] |
14:14 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
14:13 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet |
[production] |
14:12 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet |
[production] |
14:11 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
14:10 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=kubernetes2001.* |
[production] |
14:09 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:518028|Switch property terms migration to WRITE_BOTH on test wikidata (T225051)]] (duration: 00m 56s) |
[production] |
14:06 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
14:04 |
<ema@cumin1001> |
END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) |
[production] |
13:58 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
13:56 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
13:50 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
13:38 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
13:35 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
13:31 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
13:28 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
13:23 |
<marostegui> |
Stop replication on labsdb1011 to defragment tables T222978 |
[production] |
13:22 |
<ema@cumin1001> |
END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) |
[production] |
13:21 |
<jijiki> |
depool mw1311 |
[production] |
13:20 |
<marostegui> |
Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978 |
[production] |
13:16 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
13:11 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
13:04 |
<ema@cumin1001> |
START - Cookbook sre.hosts.upgrade-and-reboot |
[production] |
12:59 |
<jijiki> |
Disable puppet on jobrunners to merge 518023 and 518018 |
[production] |
12:56 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |
12:50 |
<ema> |
powercycle cp2017, stuck rebooting |
[production] |
12:44 |
<hashar> |
Upgrading packages on contint1001 |
[production] |
12:44 |
<ema@cumin1001> |
END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) |
[production] |