2020-11-19
§
|
12:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1106 T267090', diff saved to https://phabricator.wikimedia.org/P13334 and previous config saved to /var/cache/conftool/dbconfig/20201119-122459-marostegui.json |
[production] |
12:00 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
11:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
11:46 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
11:44 |
<moritzm> |
installing Java security updates on Hadoop/Kafka Jumbo hosts |
[production] |
11:42 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
11:40 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
11:33 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
11:00 |
<Urbanecm> |
Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=ruwiki; T246539) |
[production] |
10:28 |
<marostegui> |
Restart mysql on db1115, tendril and dbtree will be down for a few minutes |
[production] |
09:40 |
<marostegui> |
Stop mysql on db1124:3311 to clone clouddb1013 and clouddb1017, there will be lag on s1 on wikireplicas - T267090 |
[production] |
09:29 |
<moritzm> |
upgrading serpens to Buster |
[production] |
09:26 |
<XioNoX> |
eqiad row C: move Ganeti/LVS interfaces to individual terms |
[production] |
09:07 |
<elukey> |
restart kafka daemons on kafka-jumbo1001 for openjdk upgrades (canary) |
[production] |
08:56 |
<effie> |
disable puppet on mw canaries to merge 641816 |
[production] |
08:55 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) |
[production] |
08:49 |
<elukey> |
restart hadoop daemons on analytics1058 for openjdk upgrades (canary) |
[production] |
08:25 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-masters |
[production] |
08:19 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) |
[production] |
08:19 |
<XioNoX> |
eqiad row C: standardize interfaces config |
[production] |
07:55 |
<XioNoX> |
eqiad row D: move Ganeti/LVS interfaces to individual terms |
[production] |
07:47 |
<XioNoX> |
eqiad row D: standardize interfaces config |
[production] |
07:22 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-masters |
[production] |
07:21 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) |
[production] |
07:05 |
<elukey> |
roll restart java daemons on Hadoop test for openjdk upgrades |
[production] |
07:05 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.roll-restart-workers |
[production] |
06:22 |
<marostegui@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
06:21 |
<marostegui> |
Remove es1014 from tendril and zarcillo T268102 |
[production] |
06:18 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
06:08 |
<marostegui> |
Stop mysql on db1125:3316 to clone clouddb1015 and clouddb1019, there will be lag on s6 on wikireplicas - T267090 |
[production] |
02:41 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) |
[production] |
01:30 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
2020-11-18
§
|
23:34 |
<mutante> |
disabling puppet on memcache::mediawiki - deploying gerrit:637742 |
[production] |
22:56 |
<dpifke@deploy1001> |
Finished deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269 (duration: 00m 04s) |
[production] |
22:56 |
<dpifke@deploy1001> |
Started deploy [performance/arc-lamp@6bbac6d]: Fix for bytes/str issue after T267269 |
[production] |
22:24 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
22:22 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
22:19 |
<urbanecm@deploy1001> |
Synchronized wmf-config/CommonSettings.php: Deploy GlobalWatchlist to beta (noop; T268181) (duration: 01m 04s) |
[production] |
22:11 |
<urbanecm@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalWatchlist extension: Prepare IS.php to know relevant variables (noop; T268181) (duration: 01m 06s) |
[production] |
22:05 |
<urbanecm@deploy1001> |
Synchronized wmf-config/extension-list: Deploy GlobalWatchlist extension to beta: add it to extension-list (T268181) (duration: 01m 05s) |
[production] |
21:53 |
<mutante> |
mwdebug1003 - restarting ferm because config was generated but service not restarted due to puppet dependency errors, breaking NRPE monitoring T267248 |
[production] |
21:47 |
<mutante> |
mwdebug1003 - scap pull - T267248 |
[production] |
21:40 |
<mutante> |
mw1317,mw1318 - back in action and all monitoring activated again |
[production] |
21:17 |
<dzahn@cumin1001> |
conftool action : set/weight=10; selector: name=mw1318.eqiad.wmnet,cluster=videoscaler |
[production] |
21:08 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet |
[production] |
21:08 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet |
[production] |
21:02 |
<mutante> |
mw1317,mw1318 - repooled=no after physical move to rack B |
[production] |
20:56 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet |
[production] |
20:54 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet |
[production] |
20:27 |
<mutante> |
mw1317, mw1318 shutting down for physical move |
[production] |