401-450 of 10000 results (29ms)
2021-03-04 §
11:25 <arturo> rebooted tools-sgewebgrid-generic-0901, repool it again [tools]
11:24 <dcaro> rebooted cloudvirt1022, re-adding to ceph and removing from maintenance host aggregate for T275753 [admin]
11:14 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE [production]
11:11 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE [production]
11:10 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Needs fixing after T274472 [production]
11:10 <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Needs fixing after T274472 [production]
11:08 <dcaro@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1022.eqiad.wmnet [production]
11:04 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE [production]
11:02 <Majavah> live hacking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668338/ on deployment-deploy01 to test new deployment-mwlog01 ref T276419 [releng]
11:02 <dcaro@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1022.eqiad.wmnet [production]
11:02 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE [production]
11:01 <dcaro> rebooting cloudvirt1022 for T275753 [admin]
10:51 <Majavah> stop bogus service udp2log on deployment-mwlog01, no idea what it is but it was using the same port as udp2log-mw.service is [releng]
10:40 <elukey> drain + reimage analytics1059/1060 to Debian Buster [analytics]
10:40 <elukey> drain + reimage analytics1059/1060 to Debian Buster [production]
10:32 <moritzm> uploaded screen 4.2.1-3+deb8u1+wmf1 to jessie-wikimedia [production]
09:57 <arturo> depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022 [tools]
09:32 <elukey> reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10) [analytics]
09:32 <elukey> install linux 5.10 on an-worker[1097-1101] (GPU workers) and reboot them [production]
09:30 <kormat> disabling puppet on all db hosts while deploying a puppet monitoring change T275497 [production]
09:20 <hashar> Restored analytics/udp2log cause it got to be packaged for Buster # T276422 T180301 [releng]
09:19 <moritzm> uploaded udplog 1.8.5+deb10u1 to buster-wikimedia [production]
09:12 <dcaro> draining cloudvirt1022 for T275753 [admin]
09:02 <elukey> kill/start mediawiki-geoeditors-monthly to apply backtick change (hive script) [analytics]
08:48 <elukey> deploy refinery to hdfs [analytics]
08:45 <elukey@deploy1002> Finished deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job (duration: 11m 03s) [production]
08:34 <elukey> deploy refinery to fix https://gerrit.wikimedia.org/r/c/analytics/refinery/+/668111 [analytics]
08:33 <elukey@deploy1002> Started deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job [production]
07:47 <legoktm> rebuilding php*-compile images https://gerrit.wikimedia.org/r/668259 [releng]
07:38 <elukey> reboot an-worker1096 to pick up 5.10 kernel [analytics]
07:38 <elukey> reboot an-worker1096 to pick up 5.10 kernel [production]
06:33 <Majavah> create Buster VM deployment-mwlog01 to eventually replace deployment-fluorine02 which is still on Stretch [releng]
06:25 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1088 T276025', diff saved to https://phabricator.wikimedia.org/P14622 and previous config saved to /var/cache/conftool/dbconfig/20210304-062503-marostegui.json [production]
06:11 <marostegui> Stop MySQL on db2116 to clone db2145 T275633 [production]
06:11 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2116 T275633', diff saved to https://phabricator.wikimedia.org/P14621 and previous config saved to /var/cache/conftool/dbconfig/20210304-061134-marostegui.json [production]
05:20 <kart_> Updated apertium to 2021-03-03-170806-production (T274262) [production]
05:15 <kartik@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' . [production]
05:11 <kartik@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' . [production]
05:10 <kartik@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' . [production]
01:24 <twentyafterfour> phabricator upgrade complete [production]
01:22 <twentyafterfour> restarting php7.3-fpm on phab1001 to complete phabricator upgrade [production]
00:02 <ebernhardson@deploy1002> Finished deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion (duration: 25m 35s) [production]
2021-03-03 §
23:36 <ebernhardson@deploy1002> Started deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion [production]
23:08 <legoktm@deploy1002> conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet [production]
22:56 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwmaint2001.codfw.wmnet [production]
22:51 <legoktm@deploy1002> conftool action : set/weight=10; selector: name=registry2003.codfw.wmnet [production]
22:50 <legoktm@deploy1002> conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet [production]
22:47 <dzahn@cumin1001> START - Cookbook sre.hosts.decommission for hosts mwmaint2001.codfw.wmnet [production]
22:05 <legoktm@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2003.codfw.wmnet [production]
21:58 <mutante> puppetmaster1001 - signing puppet cert for gitlab1001.wikmedia.org (T274459) [production]