4701-4750 of 10000 results (42ms)
2021-08-03 ยง
16:14 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
16:00 <dcausse@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' . [production]
15:55 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
15:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
15:49 <jmm@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
15:34 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
15:30 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
15:26 <jmm@cumin2002> START - Cookbook sre.dns.netbox [production]
15:25 <moritzm> prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206 [production]
15:14 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet [production]
15:01 <jmm@cumin2002> START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet [production]
14:56 <jmm@cumin2002> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet [production]
14:49 <jmm@cumin2002> START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet [production]
14:32 <oblivian@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
14:27 <ottomata> chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos [production]
14:23 <ottomata> chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos [production]
14:13 <ottomata> chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos [production]
12:47 <moritzm> restarting Tomcat on idp1001 [production]
12:05 <moritzm> installing libgcrypt20 security updates [production]
11:48 <jmm@cumin2002> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet [production]
11:36 <moritzm> updated bullseye d-i images to rc3 T275873 [production]
11:28 <godog> upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113 [production]
11:25 <jmm@cumin2002> START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet [production]
11:19 <jmm@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:18 <godog> upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113 [production]
11:15 <jmm@cumin2002> START - Cookbook sre.dns.netbox [production]
11:13 <moritzm> rename Ganeti group for test cluster to row_D T286206 [production]
11:01 <jmm@cumin2002> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet [production]
10:58 <jmm@cumin2002> START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet [production]
09:18 <marostegui> Failover m1, m2 and m3-master T287574 [production]
09:12 <moritzm> installinh php 7.0 security updates on stretch [production]
09:11 <jayme> importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054 [production]
08:57 <moritzm> installing pillow security updates on stretch [production]
08:53 <jynus@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE [production]
08:50 <jynus@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE [production]
08:17 <legoktm> pausing refreshLinks run against wikiversities while other issues are figured out [production]
08:13 <jynus@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE [production]
08:10 <jynus@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE [production]
08:03 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue [production]
08:03 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue [production]
07:42 <moritzm> upgrading spicerack on cumin2002 to 0.0.57 [production]
06:31 <kart__> Updated cxserver to 2021-08-02-164000-production (T286473) [production]
06:26 <kartik@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
06:20 <kartik@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
06:15 <kartik@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [production]
04:37 <marostegui> Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020 [production]
00:43 <reedy@deploy1002> Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s) [production]
00:43 <reedy@deploy1002> Started deploy [integration/docroot@f9d225d]: with less gref [production]
00:29 <reedy@deploy1002> Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s) [production]
00:29 <reedy@deploy1002> Started deploy [integration/docroot@f7df1c7]: (no justification provided) [production]