6851-6900 of 10000 results (45ms)
2021-03-04 ยง
13:45 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P14632 and previous config saved to /var/cache/conftool/dbconfig/20210304-134521-marostegui.json [production]
13:44 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet [production]
13:44 <volans> uploaded spicerack_0.0.49 to apt.wikimedia.org buster-wikimedia [production]
13:35 <moritzm> restarting mw canaries for libzstd update [production]
13:32 <elukey> drain + reimage analytics10[63,64] to Debian Buster [production]
13:32 <elukey> drain + reimage analytics10[63,64] to Debian Buster [analytics]
13:29 <moritzm> installing libzstd security updates on Buster [production]
13:18 <Majavah> shutdown deployment-fluorine02 for a scream test for T276419, I believe everything has been moved to deployment-mwlog01 [releng]
13:13 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db2146 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P14631 and previous config saved to /var/cache/conftool/dbconfig/20210304-131301-marostegui.json [production]
13:10 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE [production]
13:08 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE [production]
13:07 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE [production]
13:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE [production]
12:48 <elukey> drain + reimage analytics10[61,62] to Debian Buster [analytics]
12:48 <elukey> drain + reimage analytics10[61,62] to Debian Buster [production]
12:45 <jakob@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . [production]
12:40 <mbsantos@deploy1002> Finished deploy [tilerator/deploy@6fcbb9f]: (no justification provided) (duration: 00m 14s) [production]
12:40 <wmde-fisch@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:668108|Remove conflicting gadget configuration for hewiki (T276330)]] (duration: 01m 12s) [production]
12:40 <mbsantos@deploy1002> Started deploy [tilerator/deploy@6fcbb9f]: (no justification provided) [production]
12:38 <Majavah> `git rebase origin/production` on deployment-puppetmaster04 to update few settings for T276419 [releng]
12:34 <jakob@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . [production]
12:19 <Majavah> Beta cluster is now using deployment-mwlog01 instead of deployment-fluorine02 for MediaWiki logs. fluorine02 is still used for some other misc services, these will be migrated soon [releng]
12:11 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak [production]
12:11 <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak [production]
12:10 <jakob@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . [production]
12:06 <Majavah> deployment-prep Delete lists.beta.wmflabs.org DNS record, points to an unassigned floating IP and not used according to Amir [releng]
12:00 <marostegui> Stop mysql on db1117:3321 to clone db1159 [production]
11:42 <jakob@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . [production]
11:40 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db2145 to s1 (and repool db2116) - T275633', diff saved to https://phabricator.wikimedia.org/P14625 and previous config saved to /var/cache/conftool/dbconfig/20210304-114052-marostegui.json [production]
11:29 <arturo> draining cloudvirt1024 for T275753 [admin]
11:28 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db2145 into dbctl depooled - T275633', diff saved to https://phabricator.wikimedia.org/P14624 and previous config saved to /var/cache/conftool/dbconfig/20210304-112848-marostegui.json [production]
11:27 <_joe_> restarted redis on mc2027 to pick up the replication change [production]
11:25 <arturo> rebooted tools-sgewebgrid-generic-0901, repool it again [tools]
11:24 <dcaro> rebooted cloudvirt1022, re-adding to ceph and removing from maintenance host aggregate for T275753 [admin]
11:14 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE [production]
11:11 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE [production]
11:10 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Needs fixing after T274472 [production]
11:10 <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Needs fixing after T274472 [production]
11:08 <dcaro@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1022.eqiad.wmnet [production]
11:04 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE [production]
11:02 <Majavah> live hacking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668338/ on deployment-deploy01 to test new deployment-mwlog01 ref T276419 [releng]
11:02 <dcaro@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1022.eqiad.wmnet [production]
11:02 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE [production]
11:01 <dcaro> rebooting cloudvirt1022 for T275753 [admin]
10:51 <Majavah> stop bogus service udp2log on deployment-mwlog01, no idea what it is but it was using the same port as udp2log-mw.service is [releng]
10:40 <elukey> drain + reimage analytics1059/1060 to Debian Buster [analytics]
10:40 <elukey> drain + reimage analytics1059/1060 to Debian Buster [production]
10:32 <moritzm> uploaded screen 4.2.1-3+deb8u1+wmf1 to jessie-wikimedia [production]
09:57 <arturo> depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022 [tools]
09:32 <elukey> reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10) [analytics]