production SAL

2051-2100 of 10000 results (76ms)

2023-01-03 §
12:27	<taavi@deploy1002>	Started deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling	[production]
11:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P42744 and previous config saved to /var/cache/conftool/dbconfig/20230103-114030-marostegui.json	[production]
11:35	<cgoubert@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
11:34	<cgoubert@cumin1001>	END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)	[production]
11:34	<cgoubert@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
11:33	<cgoubert@cumin1001>	END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)	[production]
11:30	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org	[production]
11:26	<cgoubert@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
11:25	<claime>	Starting rolling reboot of parse* hosts in codfw	[production]
11:06	<hashar>	contint2001: starting Jenkins manually	[production]
11:04	<marostegui>	Change x1 binlog format to STATEMENT T255174	[production]
11:00	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement	[production]
10:59	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement	[production]
10:59	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org	[production]
10:58	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org	[production]
10:53	<marostegui>	Restart eqiad sanitarium T326105	[production]
10:53	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org	[production]
10:50	<marostegui>	Restart codfw sanitarium masters T326105	[production]
10:49	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org	[production]
10:43	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org	[production]
10:37	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error	[production]
10:36	<cgoubert@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error	[production]
10:36	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1001.wikimedia.org	[production]
10:31	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host gerrit1001.wikimedia.org	[production]
10:25	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org	[production]
10:18	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org	[production]
09:27	<vgutierrez>	restarting varnish on cp5032 to clear VarnishChildRestarted alert - T325797	[production]
08:19	<kartik@deploy1002>	Finished scap: Backport for [[gerrit:869347\|Content Translation: Move ttwiki out of Beta (T319177)]] (duration: 16m 09s)	[production]
08:16	<jmm@puppetmaster1001>	conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet	[production]
08:12	<moritzm>	installing Linux 4.19.269 on Buster hosts	[production]
08:12	<phedenskog@deploy1002>	Finished deploy [performance/navtiming@4f8c010]: (no justification provided) (duration: 00m 08s)	[production]
08:12	<phedenskog@deploy1002>	Started deploy [performance/navtiming@4f8c010]: (no justification provided)	[production]
08:05	<kartik@deploy1002>	kartik and kartik: Backport for [[gerrit:869347\|Content Translation: Move ttwiki out of Beta (T319177)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
08:03	<kartik@deploy1002>	Started scap: Backport for [[gerrit:869347\|Content Translation: Move ttwiki out of Beta (T319177)]]	[production]
04:58	<mwpresync@deploy1002>	Finished scap: testwikis wikis to 1.40.0-wmf.17 refs T325580 (duration: 55m 31s)	[production]
04:02	<mwpresync@deploy1002>	Started scap: testwikis wikis to 1.40.0-wmf.17 refs T325580	[production]
2023-01-02 §
10:04	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet	[production]
10:00	<jelto@cumin1001>	START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet	[production]
2022-12-31 §
19:11	<AndyRussG>	payments-wiki upgraded c212825e -> f02e3585, config c1c4a9f6 -> 8103bce6	[production]
2022-12-30 §
21:36	<dcausse>	restarting blazegraph on wdqs1006 and wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)	[production]
2022-12-29 §
23:26	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)	[production]
23:25	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)	[production]
23:24	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-reload	[production]
23:22	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-reload	[production]
09:19	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam	[production]
09:19	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on an-worker1084.eqiad.wmnet with reason: Avoid IRC spam	[production]
2022-12-22 §
18:27	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet with OS bullseye	[production]
18:27	<btullis@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"	[production]
18:16	<btullis@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"	[production]
18:03	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1015.eqiad.wmnet with reason: host reimage	[production]