1-50 of 10000 results (110ms)
2025-11-19 §
07:06 <marostegui@cumin1003> dbctl commit (dc=all): 'Depool pc4', diff saved to https://phabricator.wikimedia.org/P85380 and previous config saved to /var/cache/conftool/dbconfig/20251119-070656-marostegui.json [production]
07:05 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet,pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: network maintenance [production]
06:52 <marostegui@cumin1003> START - Cookbook sre.mysql.pool db1189 gradually with 4 steps - Repooling after switchover [production]
06:52 <marostegui@cumin1003> END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1189 gradually with 4 steps - Repooling after switchover [production]
06:48 <marostegui@cumin1003> START - Cookbook sre.mysql.pool db1189 gradually with 4 steps - Repooling after switchover [production]
06:48 <marostegui@cumin1003> dbctl commit (dc=all): 'Depool db1189 T410283', diff saved to https://phabricator.wikimedia.org/P85378 and previous config saved to /var/cache/conftool/dbconfig/20251119-064838-marostegui.json [production]
06:47 <marostegui@cumin1003> dbctl commit (dc=all): 'Promote db1223 to s3 primary T410283', diff saved to https://phabricator.wikimedia.org/P85377 and previous config saved to /var/cache/conftool/dbconfig/20251119-064755-marostegui.json [production]
06:47 <marostegui> Starting s3 eqiad failover from db1189 to db1223 - T410283 [production]
06:41 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T410283 [production]
06:40 <marostegui@cumin1003> dbctl commit (dc=all): 'Set db1223 with weight 0 T410283', diff saved to https://phabricator.wikimedia.org/P85376 and previous config saved to /var/cache/conftool/dbconfig/20251119-064055-marostegui.json [production]
06:35 <marostegui@cumin1003> dbctl commit (dc=all): 'Repool pc1 after network maint', diff saved to https://phabricator.wikimedia.org/P85375 and previous config saved to /var/cache/conftool/dbconfig/20251119-063522-marostegui.json [production]
06:28 <marostegui@cumin1003> DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: db2144 went down [production]
06:27 <marostegui@cumin1003> dbctl commit (dc=all): 'Depool ms2', diff saved to https://phabricator.wikimedia.org/P85374 and previous config saved to /var/cache/conftool/dbconfig/20251119-062728-marostegui.json [production]
06:26 <marostegui@cumin1003> dbctl commit (dc=all): 'Repool ms3 T405942', diff saved to https://phabricator.wikimedia.org/P85373 and previous config saved to /var/cache/conftool/dbconfig/20251119-062634-marostegui.json [production]
06:25 <marostegui@cumin1003> dbctl commit (dc=all): 'Repool ms3 T405942', diff saved to https://phabricator.wikimedia.org/P85372 and previous config saved to /var/cache/conftool/dbconfig/20251119-062509-marostegui.json [production]
03:09 <eileen> civicrm upgraded from bc100d63 to f471a3ec [production]
02:59 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
02:46 <eileen> config revision changed from c3e95b76 to 8b1a290c [production]
01:53 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage [production]
01:50 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage [production]
01:35 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
01:23 <andrew@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
01:23 <andrew@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
01:18 <andrew@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
01:18 <andrew@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
01:14 <mwpresync@deploy2002> Finished scap build-images: Publishing wmf/next image (duration: 13m 18s) [production]
01:00 <mwpresync@deploy2002> Started scap build-images: Publishing wmf/next image [production]
00:48 <andrew@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
00:48 <andrew@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1074.eqiad.wmnet'] [production]
2025-11-18 §
23:52 <andrew@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
23:42 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1076.eqiad.wmnet with OS trixie [production]
22:57 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1075.eqiad.wmnet with OS trixie [production]
22:56 <sfaci@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply [production]
22:55 <sfaci@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply [production]
22:29 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage [production]
22:22 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage [production]
22:21 <bking@deploy2002> helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply [production]
22:21 <bking@deploy2002> helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply [production]
22:09 <bking@deploy2002> helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. [production]
22:08 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage [production]
22:08 <bking@deploy2002> helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. [production]
22:07 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
22:07 <andrew@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
22:07 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudvirt1076.eqiad.wmnet with OS trixie [production]
22:05 <andrew@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage [production]
21:58 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1073.eqiad.wmnet with OS trixie [production]
21:50 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudvirt1075.eqiad.wmnet with OS trixie [production]
21:44 <andrew@cumin2002> START - Cookbook sre.hosts.reimage for host cloudvirt1074.eqiad.wmnet with OS trixie [production]
21:43 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1072.eqiad.wmnet with OS trixie [production]
21:34 <andrew@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage [production]