production SAL

3151-3200 of 10000 results (22ms)

2020-10-29 §
01:17	<ryankemper>	T266492 Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`	[production]
01:16	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-restart	[production]
00:51	<ryankemper>	Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy	[production]
00:14	<Amir1>	rolling restart of ores	[production]
00:12	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
00:10	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
00:04	<ryankemper>	Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`	[production]
00:03	<ryankemper>	Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
00:03	<ryankemper>	Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
00:02	<ryankemper>	Following wdqs deploy, https://query.wikidata.org successfully responds to an example query	[production]
00:01	<ryankemper@deploy1001>	Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)	[production]
2020-10-28 §
23:54	<ryankemper>	Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet	[production]
23:52	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@8c97b17]: 0.3.53	[production]
23:52	<ryankemper@deploy1001>	deploy aborted: 0.3.53 (duration: 00m 00s)	[production]
23:52	<ryankemper@deploy1001>	Started deploy [wdqs/wdqs@8c97b17]: 0.3.53	[production]
22:54	<mutante>	scandium - scap pull after reinstalling OS	[production]
22:14	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
22:12	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
21:41	<ryankemper>	Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`	[production]
21:22	<herron@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)	[production]
21:05	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
21:05	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:50	<herron@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
20:22	<ladsgroup@deploy1001>	Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)	[production]
19:56	<jgleeson>	updated Smashpig from 2246685626 to 09f29c1da5	[production]
19:53	<herron@cumin1001>	END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)	[production]
19:53	<herron@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
19:50	<herron@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)	[production]
19:36	<herron@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
19:36	<herron@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)	[production]
19:36	<herron@cumin1001>	START - Cookbook sre.ganeti.makevm	[production]
19:30	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
19:30	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:22	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
19:20	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
18:56	<tgr_>	Morning deploys done	[production]
18:55	<tgr@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983\|Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)	[production]
18:51	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)	[production]
18:51	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
18:47	<volans@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
18:46	<tgr@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791\|Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)	[production]
18:45	<tgr@deploy1001>	Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956\|Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)	[production]
18:43	<volans@cumin1001>	START - Cookbook sre.dns.netbox	[production]
18:40	<tgr@deploy1001>	Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787\|Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)	[production]
18:19	<tgr@deploy1001>	Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880\|Removing obsolete license definition]] (duration: 01m 00s)	[production]
18:11	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
18:07	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
18:06	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
18:02	<elukey@cumin1001>	END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)	[production]
17:46	<elukey@cumin1001>	START - Cookbook sre.dns.netbox	[production]