2021-08-03
ยง
|
16:27 |
<hashar> |
Going to upgrade Gerrit 3.3 (scheduled maintenance) |
[production] |
16:18 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:14 |
<pt1979@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
16:00 |
<dcausse@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . |
[production] |
15:55 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:49 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:34 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:30 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:26 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
15:25 |
<moritzm> |
prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206 |
[production] |
15:14 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet |
[production] |
15:01 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet |
[production] |
14:56 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet |
[production] |
14:49 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet |
[production] |
14:32 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
14:27 |
<ottomata> |
chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos |
[production] |
14:23 |
<ottomata> |
chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos |
[production] |
14:13 |
<ottomata> |
chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos |
[production] |
12:47 |
<moritzm> |
restarting Tomcat on idp1001 |
[production] |
12:05 |
<moritzm> |
installing libgcrypt20 security updates |
[production] |
11:48 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet |
[production] |
11:36 |
<moritzm> |
updated bullseye d-i images to rc3 T275873 |
[production] |
11:28 |
<godog> |
upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113 |
[production] |
11:25 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet |
[production] |
11:19 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
11:18 |
<godog> |
upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113 |
[production] |
11:15 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
11:13 |
<moritzm> |
rename Ganeti group for test cluster to row_D T286206 |
[production] |
11:01 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet |
[production] |
10:58 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet |
[production] |
09:18 |
<marostegui> |
Failover m1, m2 and m3-master T287574 |
[production] |
09:12 |
<moritzm> |
installinh php 7.0 security updates on stretch |
[production] |
09:11 |
<jayme> |
importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054 |
[production] |
08:57 |
<moritzm> |
installing pillow security updates on stretch |
[production] |
08:53 |
<jynus@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE |
[production] |
08:50 |
<jynus@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE |
[production] |
08:17 |
<legoktm> |
pausing refreshLinks run against wikiversities while other issues are figured out |
[production] |
08:13 |
<jynus@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE |
[production] |
08:10 |
<jynus@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE |
[production] |
08:03 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
08:03 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
07:42 |
<moritzm> |
upgrading spicerack on cumin2002 to 0.0.57 |
[production] |
06:31 |
<kart__> |
Updated cxserver to 2021-08-02-164000-production (T286473) |
[production] |
06:26 |
<kartik@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:20 |
<kartik@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:15 |
<kartik@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
04:37 |
<marostegui> |
Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020 |
[production] |
00:43 |
<reedy@deploy1002> |
Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s) |
[production] |
00:43 |
<reedy@deploy1002> |
Started deploy [integration/docroot@f9d225d]: with less gref |
[production] |