1151-1200 of 10000 results (41ms)
2022-05-06 §
09:45 <jmm@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet [production]
09:40 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet [production]
09:38 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install4001.wikimedia.org [production]
09:34 <jmm@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM install4001.wikimedia.org [production]
09:33 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet [production]
09:33 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4003.wikimedia.org [production]
09:31 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1005.eqiad.wmnet [production]
09:29 <jmm@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM bast4003.wikimedia.org [production]
09:27 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2057.codfw.wmnet with OS bullseye [production]
09:25 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1005.eqiad.wmnet [production]
09:23 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet [production]
09:17 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet [production]
09:08 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet [production]
09:03 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2057.codfw.wmnet with reason: host reimage [production]
09:02 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet [production]
09:00 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet [production]
09:00 <mvernon@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2057.codfw.wmnet with reason: host reimage [production]
08:54 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet [production]
08:52 <klausman@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet [production]
08:45 <klausman@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet [production]
08:16 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:49 <mvernon@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:42 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:41 <mvernon@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:31 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:20 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet [production]
07:19 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet [production]
07:14 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet [production]
07:13 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet [production]
07:11 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet [production]
07:06 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet [production]
01:51 <dzahn@cumin2002> conftool action : set/pooled=no; selector: dc=eqiad,name=mw1415.eqiad.wmnet [production]
01:50 <dzahn@cumin2002> conftool action : set/pooled=no; selector: dc=codfw,name=mw1415.eqiad.wmnet [production]
00:46 <rook@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirt1016.eqiad.wmnet [production]
00:46 <rook@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet [production]
2022-05-05 §
22:06 <razzi@cumin1001> END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes [production]
22:01 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
21:58 <hoo@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734722|Add missing termbox codes from Wikibase (T277836)]] (duration: 00m 48s) [production]
21:56 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
21:35 <brennen@deploy1002> Synchronized php-1.39.0-wmf.10/includes/user: Backport: [[gerrit:789332|Suppress "named" group when TempUser system is disabled (T307675)]] (duration: 00m 48s) [production]
21:33 <brennen@deploy1002> scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details) [production]
21:26 <brennen@deploy1002> Finished scap: Resuming previously interrupted sync-world (duration: 03m 47s) [production]
21:25 <jhathaway@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel [production]
21:24 <jhathaway@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel [production]
21:22 <brennen@deploy1002> Started scap: Resuming previously interrupted sync-world [production]
21:21 <jhathaway@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: new kernel [production]
21:21 <jhathaway@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: new kernel [production]
21:21 <jhathaway> reboot mx1001 [production]