2021-02-04
ยง
|
14:37 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet |
[production] |
14:30 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet |
[production] |
14:28 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet |
[production] |
14:22 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet |
[production] |
14:21 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet |
[production] |
14:21 |
<godog> |
roll-restart rsync/swift-object-replicator in codfw to apply memory limits |
[production] |
14:21 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet |
[production] |
14:18 |
<effie> |
start rolling reboots of mc[2019-2027,2029-2037].codfw.wmnet T273278 |
[production] |
14:16 |
<mbsantos@deploy1001> |
Finished deploy [kartotherian/deploy@47fc426]: (no justification provided) (duration: 00m 12s) |
[production] |
14:16 |
<mbsantos@deploy1001> |
Started deploy [kartotherian/deploy@47fc426]: (no justification provided) |
[production] |
14:15 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet |
[production] |
14:14 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet |
[production] |
14:14 |
<moritzm> |
installing ffmpeg security updates on stretch |
[production] |
14:11 |
<mbsantos@deploy1001> |
Finished deploy [kartotherian/deploy@0a38bc5]: (no justification provided) (duration: 00m 03s) |
[production] |
14:11 |
<mbsantos@deploy1001> |
Started deploy [kartotherian/deploy@0a38bc5]: (no justification provided) |
[production] |
14:10 |
<mbsantos@deploy1001> |
Finished deploy [tilerator/deploy@46a2eaf]: (no justification provided) (duration: 00m 13s) |
[production] |
14:10 |
<mbsantos@deploy1001> |
Started deploy [tilerator/deploy@46a2eaf]: (no justification provided) |
[production] |
14:07 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet |
[production] |
14:05 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet |
[production] |
13:58 |
<urbanecm@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: NO-OP: 7c67b2f03cbc27cf9e5f214a6f0ea0856d8c1ae4: bnwiki: wgGEHelpPanelLinks: Remove text in brackets (T266020) (duration: 01m 12s) |
[production] |
13:51 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet |
[production] |
13:50 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet |
[production] |
13:44 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet |
[production] |
13:44 |
<vgutierrez> |
rolling restart of ncredir instances (kernel upgrade) |
[production] |
13:36 |
<moritzm> |
installing openldap security updates on buster (client-side tools/libs only, slapd instance already updated) |
[production] |
13:31 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE |
[production] |
13:31 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet |
[production] |
13:31 |
<godog> |
reboot logstash2005.codfw.wmnet, no ssh / stuck |
[production] |
13:29 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE |
[production] |
13:29 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet |
[production] |
13:10 |
<jbond42> |
upload cas_6.2.7 to downgrade cas T273867 |
[production] |
13:04 |
<ariel@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE |
[production] |
13:02 |
<ariel@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE |
[production] |
12:27 |
<moritzm> |
installing libdatetime-timezone-perl updates on Buster |
[production] |
12:17 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: reboot |
[production] |
12:17 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on 17 hosts with reason: reboot |
[production] |
12:17 |
<moritzm> |
rebooting mw[1264-1268,1276-1277,1337-1338,1404-1409,1411,1413].eqiad.wmnet for kernel update |
[production] |
12:08 |
<godog> |
bounce rsyslog on centrallog1001 |
[production] |
11:47 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1009.eqiad.wmnet |
[production] |
11:47 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1009.eqiad.wmnet |
[production] |
11:30 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) |
[production] |
11:26 |
<elukey@cumin1001> |
START - Cookbook sre.aqs.roll-restart |
[production] |
11:07 |
<elukey@puppetmaster1001> |
conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal |
[production] |
10:35 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 93 hosts with reason: reboot |
[production] |
10:35 |
<moritzm> |
rebooting mw[2261-2262,2268-2271,2273-2277,2283-2288,2290-2335,2337-2339,2350-2376].codfw.wmnet |
[production] |
10:34 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on 93 hosts with reason: reboot |
[production] |
10:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14204 and previous config saved to /var/cache/conftool/dbconfig/20210204-102312-root.json |
[production] |
10:15 |
<elukey> |
restart pybal on lvs1015 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160 |
[production] |
10:13 |
<elukey> |
restart pybal on lvs2009 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160 |
[production] |
10:08 |
<elukey> |
restart pybal on lvs1016 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - T269160 |
[production] |