2018-04-16
§
|
08:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Slowly repool db1114 in API (duration: 00m 58s) |
[production] |
08:04 |
<elukey> |
restart hhvm on mw[1228,1234,1281-1287,1289,1290,1312-1314,1317,1339,1343,1345,1346,1348] - more than 50% cpu usage, prevention scheme for current high load |
[production] |
08:00 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Slowly repool db1114 (duration: 00m 58s) |
[production] |
07:49 |
<marostegui> |
Stop MySQL and reboot db1114 - T191996 |
[production] |
07:46 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1114 (duration: 00m 59s) |
[production] |
07:40 |
<vgutierrez@neodymium> |
conftool action : set/pooled=no; selector: name=achernar.wikimedia.org,service=pdns_recursor |
[production] |
07:39 |
<vgutierrez> |
Depool and reimage achernar.wikimedia.org - T187090 |
[production] |
07:27 |
<moritzm> |
installing perl security updates on Debian systems |
[production] |
06:45 |
<TimStarling> |
depooled mw1230 |
[production] |
06:38 |
<_joe_> |
repooling mw1230 |
[production] |
06:20 |
<marostegui> |
Drop table flow_subscription from x1 - T149936 |
[production] |
05:59 |
<elukey> |
restart hhvm on mw[1221,1233,1280,1347] - high load |
[production] |
05:55 |
<elukey> |
repool mw1341 after investigation |
[production] |
05:48 |
<elukey> |
restart hhvm on mw1225, 1315, 1316, 1340, 1341, 1342, 1347 - high load |
[production] |
05:42 |
<marostegui> |
Reload haproxy on dbproxy1010 |
[production] |
05:36 |
<elukey> |
restart hhvm on mw1226,27,32,88 - high load |
[production] |
05:35 |
<_joe_> |
depooling mw1341 to further debug the API issue |
[production] |
05:33 |
<marostegui> |
Deploy schema change on db1087 with replication (this will generate lag in labs) - T187089 T185128 T153182 |
[production] |
05:31 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1087 (duration: 00m 59s) |
[production] |
03:02 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.29) (duration: 11m 09s) |
[production] |
2018-04-15
§
|
22:09 |
<ema> |
cp3037: restart varnish-be |
[production] |
21:45 |
<ema> |
cp3039: restart varnish-be |
[production] |
21:42 |
<elukey> |
restart hhvm on mw1286,1317,1339 - high load |
[production] |
21:31 |
<ema> |
cp3038: restart varnish-be |
[production] |
21:30 |
<ema> |
cp3036: restart varnish-be |
[production] |
20:52 |
<elukey> |
restart hhvm on mw13[43,45,46,48] - high load |
[production] |
20:48 |
<elukey> |
restart hhvm on mw13[12-14] - high load |
[production] |
20:45 |
<elukey> |
restart hhvm on mw[1285,1287,1289-1290] - high load |
[production] |
20:40 |
<_joe_> |
restart mw1344, high load |
[production] |
20:38 |
<elukey> |
restart hhvm on mw12[22,79,82] - high load |
[production] |
20:32 |
<elukey> |
restart hhvm on mw12[32-35] - high load |
[production] |
20:24 |
<elukey> |
restart hhvm on mw1229-31 - high load |
[production] |
20:24 |
<_joe_> |
restarted mw1280-4, high load |
[production] |
20:17 |
<elukey> |
restart hhvm on mw122[6-8] - high load |
[production] |
20:05 |
<elukey> |
restart hhvm on mw122[3,4] - high load |
[production] |
13:42 |
<elukey> |
restart hhvm on mw1227 due to high load (hhvm dump debug in /tmp/hhvm.44071.bt) |
[production] |
10:53 |
<elukey> |
powercycle mw1272 - not responsive to ssh, mgmt com2 console showing "[OK" and no tty |
[production] |
2018-04-13
§
|
20:44 |
<imarlier@tin> |
Finished deploy [performance/navtiming@8b6ab4e]: initial attempt to deploy navtiming via scap (will not be active) (duration: 00m 02s) |
[production] |
20:44 |
<imarlier@tin> |
Started deploy [performance/navtiming@8b6ab4e]: initial attempt to deploy navtiming via scap (will not be active) |
[production] |
20:00 |
<demon@tin> |
Pruned MediaWiki: 1.31.0-wmf.28 [keeping static files] (duration: 01m 34s) |
[production] |
19:23 |
<demon@tin> |
Pruned MediaWiki: 1.31.0-wmf.25 (duration: 05m 03s) |
[production] |
17:17 |
<andrewbogott> |
upgraded packages on all labvirts and restarted nova-compute |
[production] |
16:55 |
<arturo> |
enable puppet in labstore1005 |
[production] |
16:42 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Give db1104 origina main traffic weight (duration: 01m 00s) |
[production] |
16:34 |
<andrewbogott> |
upgrading packages on labvirt1016 and rebooting (1016 is a spare server that won't affect VPS users) |
[production] |
16:26 |
<arturo> |
disable puppet in labstore1005 to hot-test https://gerrit.wikimedia.org/r/#/c/426103/ |
[production] |
16:24 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Give db1104 some main traffic - T191996 (duration: 01m 00s) |
[production] |
16:04 |
<hashar> |
cleaning up lost instances in nodepool (nodepool delete XXXXX) |
[production] |
15:50 |
<andrewbogott> |
upgrading lots of packages and rebooting labservices1002 and 1002 |
[production] |
15:43 |
<andrewbogott> |
restarting nodepool on labnodepool1001 |
[production] |