production SAL

501-550 of 10000 results (59ms)

2022-07-07 §
06:07	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depool db1160 T311611', diff saved to https://phabricator.wikimedia.org/P30937 and previous config saved to /var/cache/conftool/dbconfig/20220707-060743-ladsgroup.json	[production]
06:01	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T311611', diff saved to https://phabricator.wikimedia.org/P30936 and previous config saved to /var/cache/conftool/dbconfig/20220707-060112-ladsgroup.json	[production]
06:00	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T311611', diff saved to https://phabricator.wikimedia.org/P30935 and previous config saved to /var/cache/conftool/dbconfig/20220707-060037-ladsgroup.json	[production]
06:00	<Amir1>	Starting s4 eqiad failover from db1160 to db1138 - T311611	[production]
05:35	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye	[production]
05:14	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Set db1138 with weight 0 T311611', diff saved to https://phabricator.wikimedia.org/P30933 and previous config saved to /var/cache/conftool/dbconfig/20220707-051406-ladsgroup.json	[production]
05:13	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611	[production]
05:12	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s4 T311611	[production]
01:09	<mutante>	gitlab1004 - systemctl reset-failed, clear icinga alerts about rsync to decom'ed machine	[production]
00:58	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
00:57	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
00:57	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
00:56	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
00:25	<dzahn@cumin2002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab1001.wikimedia.org	[production]
00:25	<dzahn@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
2022-07-06 §
23:50	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory (duration: 02m 05s)	[production]
23:48	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@5082f17]: increase subgraph_mapping_weekly executor memory	[production]
23:30	<dzahn@cumin2002>	START - Cookbook sre.dns.netbox	[production]
23:25	<dzahn@cumin2002>	START - Cookbook sre.hosts.decommission for hosts gitlab1001.wikimedia.org	[production]
23:00	<mutante>	gitlab1004 - rm /lib/systemd/system/rsync-config-backup-gitlab1001* T307142	[production]
22:52	<mutante>	etherpad - deleted 2 pads that had leaked information	[production]
22:52	<ebernhardson>	restart airflow-webserver and airflow-scheduler for plugins update on an-airflow1001	[production]
22:37	<ebernhardson@deploy1002>	Finished deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics (duration: 02m 01s)	[production]
22:35	<ebernhardson@deploy1002>	Started deploy [wikimedia/discovery/analytics@debd402]: airflow dags to generate subgraph and query mapping along with their metrics	[production]
21:40	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1005.wikimedia.org with OS bullseye	[production]
21:40	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye	[production]
21:40	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye	[production]
21:39	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1003.wikimedia.org with OS bullseye	[production]
21:39	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudrabbit1002.wikimedia.org with OS bullseye	[production]
20:59	<cmjohnson@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye	[production]
20:44	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudservices1005.wikimedia.org with OS bullseye	[production]
20:44	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye	[production]
20:43	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye	[production]
20:43	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudrabbit1003.wikimedia.org with OS bullseye	[production]
20:43	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudrabbit1002.wikimedia.org with OS bullseye	[production]
20:38	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye	[production]
20:36	<cmjohnson@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit1001.wikimedia.org with OS bullseye	[production]
20:35	<cjming>	end of UTC late backport window	[production]
20:23	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudrabbit1001.wikimedia.org with OS bullseye	[production]
20:12	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
20:11	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
20:11	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
20:11	<cjming@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811762\|Enable sticky header edit A/B test for pilot wikis excluding idwiki/viwiki (T311144)]] (duration: 03m 25s)	[production]
20:10	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
19:54	<bd808@mwmaint1002>	Testing statshbot following deploy of [[gerrit:809732]]. This should be logged in SAL, but stashbot should not say that was done on irc.	[production]
19:13	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1003.wikimedia.org with OS bullseye	[production]
19:00	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye	[production]
18:48	<bking@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye	[production]
18:48	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudelastic1003.wikimedia.org with OS bullseye	[production]
18:47	<bking@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1003.wikimedia.org with OS bullseye	[production]