701-750 of 10000 results (56ms)
2022-05-06 §
08:29 <joal> kill cassandra-monthly-wf-local_group_default_T_mediarequest_top_files-2022-4 as it was probably saturating network [analytics]
08:16 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:49 <mvernon@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:42 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:41 <mvernon@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:31 <mvernon@cumin1001> START - Cookbook sre.hosts.reimage for host ms-be2057.codfw.wmnet with OS bullseye [production]
07:20 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet [production]
07:19 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet [production]
07:14 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet [production]
07:13 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet [production]
07:11 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet [production]
07:06 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet [production]
07:06 <wm-bot> <samwilson> Updating to version 0.1.0 [tools.docs]
01:51 <dzahn@cumin2002> conftool action : set/pooled=no; selector: dc=eqiad,name=mw1415.eqiad.wmnet [production]
01:50 <dzahn@cumin2002> conftool action : set/pooled=no; selector: dc=codfw,name=mw1415.eqiad.wmnet [production]
00:46 <rook@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirt1016.eqiad.wmnet [production]
00:46 <rook@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet [production]
2022-05-05 §
22:57 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789723 [releng]
22:31 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789721 [releng]
22:28 <dduvall> created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789720 [releng]
22:24 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789718 [releng]
22:21 <dduvall> created 4 new jobs to deploy https://gerrit.wikimedia.org/r/789717 [releng]
22:15 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789714 [releng]
22:13 <dduvall> created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789713 [releng]
22:09 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789711 [releng]
22:07 <dduvall> created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789710 [releng]
22:06 <razzi@cumin1001> END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka main-eqiad cluster: Reboot kafka nodes [production]
22:01 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
22:00 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
21:58 <hoo@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734722|Add missing termbox codes from Wikibase (T277836)]] (duration: 00m 48s) [production]
21:57 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789707/1 [releng]
21:56 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
21:51 <dduvall> created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789706 [releng]
21:48 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789704 [releng]
21:44 <dduvall> created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789703 [releng]
21:38 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789698 [releng]
21:35 <dduvall> created 4 jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789697 [releng]
21:35 <brennen@deploy1002> Synchronized php-1.39.0-wmf.10/includes/user: Backport: [[gerrit:789332|Suppress "named" group when TempUser system is disabled (T307675)]] (duration: 00m 48s) [production]
21:33 <brennen@deploy1002> scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details) [production]
21:26 <brennen@deploy1002> Finished scap: Resuming previously interrupted sync-world (duration: 03m 47s) [production]
21:26 <dduvall> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789694 [releng]
21:25 <jhathaway@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel [production]
21:24 <jhathaway@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel [production]
21:22 <dduvall> creating 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789693 [releng]
21:22 <brennen@deploy1002> Started scap: Resuming previously interrupted sync-world [production]
21:21 <jhathaway@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: new kernel [production]
21:21 <jhathaway@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: new kernel [production]
21:21 <jhathaway> reboot mx1001 [production]
21:18 <dduvall@deploy1002> helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply [production]