2021-02-05
§
|
11:34 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet |
[production] |
11:30 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet |
[production] |
11:29 |
<vgutierrez> |
restart acme-chief instances to catch up on kernel upgrades |
[production] |
11:27 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet |
[production] |
11:23 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet |
[production] |
11:22 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3002.esams.wmnet |
[production] |
11:16 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet |
[production] |
11:14 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet |
[production] |
11:08 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet |
[production] |
11:06 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet |
[production] |
10:56 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet |
[production] |
10:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14222 and previous config saved to /var/cache/conftool/dbconfig/20210205-105345-root.json |
[production] |
10:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14221 and previous config saved to /var/cache/conftool/dbconfig/20210205-103841-root.json |
[production] |
10:32 |
<godog> |
swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837 |
[production] |
10:27 |
<vgutierrez@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99) |
[production] |
10:27 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-cluster |
[production] |
10:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14220 and previous config saved to /var/cache/conftool/dbconfig/20210205-102338-root.json |
[production] |
10:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14219 and previous config saved to /var/cache/conftool/dbconfig/20210205-100834-root.json |
[production] |
10:06 |
<gehel> |
repooling wdqs1013 - catched up on lag |
[production] |
09:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 10%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14218 and previous config saved to /var/cache/conftool/dbconfig/20210205-095331-root.json |
[production] |
09:45 |
<dcausse> |
reloading categories from scratch on wdqs1010 |
[production] |
09:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1075 (re)pooling @ 5%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14217 and previous config saved to /var/cache/conftool/dbconfig/20210205-093827-root.json |
[production] |
08:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1094 T273710', diff saved to https://phabricator.wikimedia.org/P14214 and previous config saved to /var/cache/conftool/dbconfig/20210205-084625-marostegui.json |
[production] |
08:29 |
<dcausse> |
reloading categories from scratch on wdqs1009 |
[production] |
07:55 |
<gehel> |
cleanup of left over ttl dumps on wdqs1009 and wdqs1010 |
[production] |
07:47 |
<gehel> |
depooling wdqs1013 and restarting blazegraph |
[production] |
07:28 |
<oblivian@cumin1001> |
END (PASS) - Cookbook sre.network.cf (exit_code=0) |
[production] |
07:28 |
<oblivian@cumin1001> |
START - Cookbook sre.network.cf |
[production] |
06:36 |
<marostegui> |
Stop MySQL on db1075 to clone db1157 T258361 |
[production] |
06:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1075 T258361', diff saved to https://phabricator.wikimedia.org/P14212 and previous config saved to /var/cache/conftool/dbconfig/20210205-063554-marostegui.json |
[production] |
03:42 |
<aaron@deploy1001> |
Synchronized wmf-config/mc.php: af5b0effb5e88ac4ca4a06c2c409d303ec405305 (duration: 01m 06s) |
[production] |
03:34 |
<aaron@deploy1001> |
Synchronized php-1.36.0-wmf.27/includes/libs/rdbms: 4b386661a9820a002b43bfcef3e18241ea883870 (duration: 01m 12s) |
[production] |
02:03 |
<Krinkle> |
krinkle@mwmaint1002 Prune globalimagelinks references on s4 database for the deleted ukwikimedia wiki, ref T218170. |
[production] |
01:01 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec (duration: 01m 12s) |
[production] |
00:59 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec |
[production] |
00:36 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet |
[production] |
00:35 |
<legoktm> |
enabled remote IPMI access on mw1349.mgmt.eqiad.wmnet and mw1380.mgmt.eqiad.wmnet |
[production] |
00:24 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well (duration: 02m 43s) |
[production] |
00:21 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well |
[production] |
2021-02-04
§
|
23:59 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 (duration: 01m 06s) |
[production] |
23:58 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 |
[production] |
23:26 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE |
[production] |
23:24 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE |
[production] |
23:05 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet |
[production] |
23:05 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1396.eqiad.wmnet |
[production] |
23:02 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1396.eqiad.wmnet |
[production] |
23:02 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet |
[production] |
23:01 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1311.eqiad.wmnet |
[production] |
22:55 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet |
[production] |
22:39 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet |
[production] |