301-350 of 10000 results (24ms)
2026-02-09 ยง
09:41 <jayme> kubectl delete node wikikube-worker2019.codfw.wmnet - T409102 [production]
09:40 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet [production]
09:38 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet [production]
09:32 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet [production]
09:32 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet [production]
09:29 <ayounsi@cumin1003> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [production]
09:26 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet [production]
09:25 <phuedx@deploy2002> phuedx: Backport for [[gerrit:1237851|metrics(ReviseTone): Use Experiment::send to send metrics (T416612)]], [[gerrit:1237852|metrics(ReviseTone): send consistent experiment exposure event (T416199)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [production]
09:21 <phuedx@deploy2002> Started scap sync-world: Backport for [[gerrit:1237851|metrics(ReviseTone): Use Experiment::send to send metrics (T416612)]], [[gerrit:1237852|metrics(ReviseTone): send consistent experiment exposure event (T416199)]] [production]
09:13 <ayounsi@cumin1003> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage [production]
09:10 <ayounsi@cumin1003> START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage [production]
09:00 <ayounsi@cumin1003> END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aux-k8s-worker1006 [production]
09:00 <ayounsi@cumin1003> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1006 [production]
08:59 <ayounsi@cumin1003> START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1006 [production]
08:59 <ayounsi@cumin1003> END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [production]
08:59 <ayounsi@cumin1003> START - Cookbook sre.dns.wipe-cache aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors [production]
08:59 <ayounsi@cumin1003> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
08:56 <ayounsi@cumin1003> START - Cookbook sre.dns.netbox [production]
08:55 <ayounsi@cumin1003> START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1006 [production]
08:55 <ayounsi@cumin1003> START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [production]
08:44 <jforrester@deploy2002> Finished scap sync-world: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] (duration: 37m 15s) [production]
08:39 <marostegui@cumin1003> END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2203: After schema change [production]
08:31 <jforrester@deploy2002> daphnesmit, jforrester: Continuing with sync [production]
08:30 <jforrester@deploy2002> daphnesmit, jforrester: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. [production]
08:25 <brouberol@cumin1003> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-launcher1003.eqiad.wmnet [production]
08:21 <brouberol@cumin1003> START - Cookbook sre.ganeti.reboot-vm for VM an-launcher1003.eqiad.wmnet [production]
08:19 <brouberol@cumin1003> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1003.eqiad.wmnet [production]
08:15 <brouberol@cumin1003> START - Cookbook sre.hosts.reboot-single for host an-launcher1003.eqiad.wmnet [production]
08:06 <jforrester@deploy2002> Started scap sync-world: Backport for [[gerrit:1227748|[wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)]] [production]
07:54 <marostegui@cumin1003> START - Cookbook sre.mysql.newpool pool db2203: After schema change [production]
07:40 <jmm@cumin2002> DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nettrom out of all services on: 2497 hosts [production]
07:26 <ayounsi@cumin1003> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm [production]
07:26 <ayounsi@cumin1003> END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host aux-k8s-worker1006 [production]
07:25 <ayounsi@cumin1003> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
06:23 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Maintenance [production]
06:22 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2203.codfw.wmnet with reason: Schema change [production]
06:19 <marostegui@dns1006> END - running authdns-update [production]
06:19 <marostegui@cumin1003> dbctl commit (dc=all): 'Depool db2203 T416554', diff saved to https://phabricator.wikimedia.org/P88725 and previous config saved to /var/cache/conftool/dbconfig/20260209-061904-marostegui.json [production]
06:18 <marostegui@dns1006> START - running authdns-update [production]
06:17 <marostegui@cumin1003> dbctl commit (dc=all): 'Promote db2212 to s1 primary and set section read-write T416554', diff saved to https://phabricator.wikimedia.org/P88724 and previous config saved to /var/cache/conftool/dbconfig/20260209-061756-marostegui.json [production]
06:17 <marostegui@cumin1003> dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T416554', diff saved to https://phabricator.wikimedia.org/P88723 and previous config saved to /var/cache/conftool/dbconfig/20260209-061732-marostegui.json [production]
06:13 <marostegui> Starting s1 codfw failover from db2203 to db2212 - T416554 [production]
06:12 <marostegui@cumin1003> dbctl commit (dc=all): 'Set db2212 with weight 0 T416554', diff saved to https://phabricator.wikimedia.org/P88722 and previous config saved to /var/cache/conftool/dbconfig/20260209-061218-marostegui.json [production]
06:11 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T416554 [production]
03:00 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) on deployment eqiad1 for all services [admin]
02:47 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.restart_openstack on deployment eqiad1 for all services [admin]
02:47 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.rabbitmq.rebuild_rabbit_cluster (exit_code=0) on deployment eqiad1 [admin]
02:44 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.rabbitmq.rebuild_rabbit_cluster on deployment eqiad1 [admin]
02:13 <mwpresync@deploy2002> Finished scap build-images: Publishing wmf/next image (duration: 12m 52s) [production]
02:00 <mwpresync@deploy2002> Started scap build-images: Publishing wmf/next image [production]