901-950 of 10000 results (73ms)
2023-01-26 ยง
17:19 <dancy@deploy1002> Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 [production]
17:16 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet [production]
17:13 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json [production]
17:12 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet [production]
17:10 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage [production]
17:07 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage [production]
17:06 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet [production]
17:06 <brett@cumin1001> START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye [production]
17:05 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet [production]
17:05 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:05 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
17:04 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet [production]
17:03 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye [production]
17:02 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet [production]
16:59 <cgoubert@deploy1002> Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s) [production]
16:57 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json [production]
16:56 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet [production]
16:53 <claime> Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794 [production]
16:51 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:49 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027'] [production]
16:48 <sukhe> correcting earlier log: pooling lvs2007 after T326564 [production]
16:48 <sukhe> pooling lvs2009 after T326564 [production]
16:42 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json [production]
16:41 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]
16:41 <sukhe@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027'] [production]
16:38 <sukhe@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye [production]
16:38 <brett@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]
16:33 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet [production]
16:31 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet [production]
16:28 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:27 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet [production]
16:27 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye [production]
16:27 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json [production]
16:27 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:26 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet [production]
16:24 <aborrero@cumin2002> END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev [production]
16:23 <aborrero@cumin2002> START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev [production]
16:23 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet [production]
16:21 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet [production]
16:21 <cgoubert@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply [production]
16:20 <cgoubert@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-debug: apply [production]
16:20 <aborrero@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:20 <cgoubert@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-debug: apply [production]
16:19 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet [production]
16:19 <cgoubert@deploy1002> helmfile [codfw] START helmfile.d/services/mw-debug: apply [production]
16:19 <aborrero@cumin2002> START - Cookbook sre.dns.netbox [production]
16:18 <brett@cumin1001> START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye [production]
16:14 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet [production]
16:13 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717 [production]
16:13 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717 [production]