351-400 of 10000 results (55ms)
2022-05-12 ยง
19:35 <wm-bot2> Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster [admin]
18:57 <jelto> restart gitlab2001 [production]
18:30 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
18:26 <krinkle@deploy1002> Synchronized w/static.php: Ic0a5eae4f721a16403071d1b2136cf23d78e4fa9 (duration: 00m 49s) [production]
18:26 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
18:26 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
18:26 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
18:25 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
18:11 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage [production]
18:08 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage [production]
17:52 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:51 <robh@cumin1001> START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:50 <jgiannelos@deploy1002> helmfile [codfw] DONE helmfile.d/services/mobileapps: apply [production]
17:50 <razzi@deploy1002> Finished deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) (duration: 00m 08s) [production]
17:50 <razzi@deploy1002> Started deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) [production]
17:50 <razzi@deploy1002> Finished deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) (duration: 29m 32s) [production]
17:50 <jgiannelos@deploy1002> helmfile [codfw] START helmfile.d/services/mobileapps: apply [production]
17:47 <jgiannelos@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply [production]
17:46 <jgiannelos@deploy1002> helmfile [eqiad] START helmfile.d/services/mobileapps: apply [production]
17:45 <jgiannelos@deploy1002> helmfile [staging] DONE helmfile.d/services/mobileapps: apply [production]
17:44 <jgiannelos@deploy1002> helmfile [staging] START helmfile.d/services/mobileapps: apply [production]
17:43 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
17:31 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1006.eqiad.wmnet with OS buster [production]
17:26 <jmm@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:21 <razzi@deploy1002> Started deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) [production]
17:08 <jmm@cumin1001> START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:00 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage [production]
16:57 <klausman@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage [production]
16:53 <razzi@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade [production]
16:53 <razzi@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade [production]
16:35 <klausman@cumin1001> START - Cookbook sre.hosts.reimage for host ores1006.eqiad.wmnet with OS buster [production]
16:22 <TheresNoTime> Deployed b30b346 & restarted SULWatcher [tools.stewardbots]
16:21 <mutante> gitlab2001 - trying to stop 'puma' for debugging T308089 [production]
16:14 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:07 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
16:06 <cmooney@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
16:05 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1006.wikimedia.org [production]
16:00 <hashar> contint2001 and contint1001 now automatically run `docker system prune --force` every day and `docker system prune --force` on Sunday | https://gerrit.wikimedia.org/r/c/operations/puppet/+/773784/ [releng]
15:57 <andrew@cumin1001> START - Cookbook sre.hosts.reboot-single for host labstore1006.wikimedia.org [production]
15:57 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
15:56 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1007.wikimedia.org [production]
15:53 <andrew@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host labstore1005.eqiad.wmnet [production]
15:52 <TheresNoTime> Deployed ef01194 (within the last hour) [tools.stewardbots]
15:06 <razzi@cumin1001> conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet [production]
15:06 <andrewbogott> stopping nfs-server on labstore1004 in preparation for reboot [admin]
15:05 <brennen> gitlab-prod-1001.devtools: soft reboot [releng]
15:05 <klausman@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ores1008.eqiad.wmnet with reason: host reimage [production]
14:55 <ladsgroup@cumin1001> dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P27819 and previous config saved to /var/cache/conftool/dbconfig/20220512-145554-root.json [production]
14:49 <razzi> undo the 2 previous confctl changes to repool dbproxy1019 to wikireplicas-b only [analytics]
14:48 <razzi@cumin1001> conftool action : set/pooled=inactive; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet [production]