4001-4050 of 10000 results (49ms)
2022-05-12 ยง
19:57 <hashar> Restarting Gerrit [production]
19:53 <mutante> gitlab2001 - systemctl start backup-restore - systemd[1]: Started GitLab Backup Restore. after gerrit:791410 for T308089 [production]
19:53 <hashar> gerrit: triggering full replication to gerrit2001 to test T307137 [releng]
19:36 <wm-bot2> Set cloudvirt 'cloudvirt1022.eqiad.wmnet' maintenance. - cookbook ran by andrew@buster [admin]
19:35 <wm-bot2> Draining 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster [admin]
19:35 <wm-bot2> Safe rebooting 'cloudvirt1022.eqiad.wmnet'. - cookbook ran by andrew@buster [admin]
18:57 <jelto> restart gitlab2001 [production]
18:30 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
18:26 <krinkle@deploy1002> Synchronized w/static.php: Ic0a5eae4f721a16403071d1b2136cf23d78e4fa9 (duration: 00m 49s) [production]
18:26 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
18:26 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
18:26 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
18:25 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
18:11 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage [production]
18:08 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4001.ulsfo.wmnet with reason: host reimage [production]
17:52 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:51 <robh@cumin1001> START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:50 <jgiannelos@deploy1002> helmfile [codfw] DONE helmfile.d/services/mobileapps: apply [production]
17:50 <razzi@deploy1002> Finished deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) (duration: 00m 08s) [production]
17:50 <razzi@deploy1002> Started deploy [analytics/turnilo/deploy@5047d7d]: (no justification provided) [production]
17:50 <razzi@deploy1002> Finished deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) (duration: 29m 32s) [production]
17:50 <jgiannelos@deploy1002> helmfile [codfw] START helmfile.d/services/mobileapps: apply [production]
17:47 <jgiannelos@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply [production]
17:46 <jgiannelos@deploy1002> helmfile [eqiad] START helmfile.d/services/mobileapps: apply [production]
17:45 <jgiannelos@deploy1002> helmfile [staging] DONE helmfile.d/services/mobileapps: apply [production]
17:44 <jgiannelos@deploy1002> helmfile [staging] START helmfile.d/services/mobileapps: apply [production]
17:43 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
17:31 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ores1006.eqiad.wmnet with OS buster [production]
17:26 <jmm@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:21 <razzi@deploy1002> Started deploy [analytics/turnilo/deploy@9cfdfaf]: (no justification provided) [production]
17:08 <jmm@cumin1001> START - Cookbook sre.hosts.reimage for host ganeti4001.ulsfo.wmnet with OS bullseye [production]
17:00 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage [production]
16:57 <klausman@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ores1006.eqiad.wmnet with reason: host reimage [production]
16:53 <razzi@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade [production]
16:53 <razzi@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Attempting OS upgrade [production]
16:35 <klausman@cumin1001> START - Cookbook sre.hosts.reimage for host ores1006.eqiad.wmnet with OS buster [production]
16:22 <TheresNoTime> Deployed b30b346 & restarted SULWatcher [tools.stewardbots]
16:21 <mutante> gitlab2001 - trying to stop 'puma' for debugging T308089 [production]
16:14 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:07 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
16:06 <cmooney@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
16:05 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1006.wikimedia.org [production]
16:00 <hashar> contint2001 and contint1001 now automatically run `docker system prune --force` every day and `docker system prune --force` on Sunday | https://gerrit.wikimedia.org/r/c/operations/puppet/+/773784/ [releng]
15:57 <andrew@cumin1001> START - Cookbook sre.hosts.reboot-single for host labstore1006.wikimedia.org [production]
15:57 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
15:56 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labstore1007.wikimedia.org [production]
15:53 <andrew@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host labstore1005.eqiad.wmnet [production]
15:52 <TheresNoTime> Deployed ef01194 (within the last hour) [tools.stewardbots]
15:06 <razzi@cumin1001> conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet [production]
15:06 <andrewbogott> stopping nfs-server on labstore1004 in preparation for reboot [admin]