1551-1600 of 10000 results (75ms)
2023-01-26 ยง
17:05 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
17:04 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet [production]
17:03 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye [production]
17:02 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet [production]
16:59 <cgoubert@deploy1002> Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s) [production]
16:57 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json [production]
16:56 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet [production]
16:53 <claime> Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794 [production]
16:51 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:49 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027'] [production]
16:48 <sukhe> correcting earlier log: pooling lvs2007 after T326564 [production]
16:48 <sukhe> pooling lvs2009 after T326564 [production]
16:42 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json [production]
16:41 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]
16:41 <sukhe@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027'] [production]
16:38 <sukhe@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye [production]
16:38 <brett@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]
16:33 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet [production]
16:31 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet [production]
16:28 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:27 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet [production]
16:27 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye [production]
16:27 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json [production]
16:27 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:26 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet [production]
16:24 <aborrero@cumin2002> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev [production]
16:23 <aborrero@cumin2002> START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev [production]
16:23 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet [production]
16:21 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet [production]
16:21 <cgoubert@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply [production]
16:20 <cgoubert@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-debug: apply [production]
16:20 <aborrero@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:20 <cgoubert@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-debug: apply [production]
16:19 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet [production]
16:19 <cgoubert@deploy1002> helmfile [codfw] START helmfile.d/services/mw-debug: apply [production]
16:19 <aborrero@cumin2002> START - Cookbook sre.dns.netbox [production]
16:18 <brett@cumin1001> START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye [production]
16:14 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet [production]
16:13 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717 [production]
16:13 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717 [production]
16:12 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json [production]
16:11 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json [production]
16:10 <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json [production]
16:10 <marostegui> Starting s8 codfw failover from db2161 to db2165 - T328024 [production]
16:09 <moritzm> installing distro-info-data updates from Bullseye point release [production]
16:08 <aborrero@cumin2002> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet [production]
16:08 <aborrero@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:08 <aborrero@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002" [production]
16:06 <aborrero@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002" [production]
16:05 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet [production]