251-300 of 10000 results (67ms)
2023-01-26 ยง
18:15 <cgoubert@deploy1002> helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync [production]
18:14 <cgoubert@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply [production]
18:14 <cgoubert@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-debug: apply [production]
18:14 <cgoubert@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-debug: apply [production]
18:13 <cgoubert@deploy1002> helmfile [codfw] START helmfile.d/services/mw-debug: apply [production]
18:13 <cgoubert@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply [production]
18:12 <cgoubert@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-api-int: apply [production]
18:12 <cgoubert@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply [production]
18:11 <cgoubert@deploy1002> helmfile [codfw] START helmfile.d/services/mw-api-int: apply [production]
18:11 <cgoubert@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply [production]
18:10 <cgoubert@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply [production]
18:10 <cgoubert@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply [production]
18:09 <cgoubert@deploy1002> helmfile [codfw] START helmfile.d/services/mw-api-ext: apply [production]
17:59 <brett@cumin1001> START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye [production]
17:55 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet [production]
17:49 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye [production]
17:30 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet [production]
17:28 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json [production]
17:27 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage [production]
17:24 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet [production]
17:24 <brett@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage [production]
17:22 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet [production]
17:19 <dancy@deploy1002> Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s) [production]
17:19 <dancy@deploy1002> Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 [production]
17:16 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet [production]
17:13 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json [production]
17:12 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet [production]
17:10 <sukhe@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage [production]
17:07 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage [production]
17:06 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet [production]
17:06 <brett@cumin1001> START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye [production]
17:05 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet [production]
17:05 <elukey@deploy1002> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'. [production]
17:05 <elukey@deploy1002> helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'. [production]
17:04 <brett@cumin1001> conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet [production]
17:03 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye [production]
17:02 <ariel@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet [production]
16:59 <cgoubert@deploy1002> Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s) [production]
16:57 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json [production]
16:56 <ariel@cumin1001> START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet [production]
16:53 <claime> Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794 [production]
16:51 <sukhe@cumin2002> START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye [production]
16:49 <sukhe@cumin2002> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027'] [production]
16:48 <sukhe> correcting earlier log: pooling lvs2007 after T326564 [production]
16:48 <sukhe> pooling lvs2009 after T326564 [production]
16:42 <marostegui@cumin1001> dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json [production]
16:41 <brett@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]
16:41 <sukhe@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027'] [production]
16:38 <sukhe@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye [production]
16:38 <brett@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage [production]