production SAL

4301-4350 of 10000 results (51ms)

2021-10-08 §
23:10	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
21:38	<mutante>	mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress	[production]
21:34	<mutante>	disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)	[production]
21:30	<legoktm>	running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'	[production]
20:12	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
20:10	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE	[production]
20:08	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
20:08	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE	[production]
20:06	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE	[production]
20:05	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE	[production]
19:46	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE	[production]
19:45	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE	[production]
19:43	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE	[production]
19:42	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE	[production]
19:42	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
19:39	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
18:15	<cstone>	civicrm revision changed from 5cb7d487cb to 598b59b0ee	[production]
16:19	<urbanecm>	[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki	[production]
15:48	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
15:48	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
15:29	<jelto>	enable puppet on gitlab1001 again for T283076	[production]
14:05	<jiji@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
14:01	<jiji@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:49	<Amir1>	wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');	[production]
09:39	<Emperor>	installing stress on ms-be2045 given recent h/w issues T290881	[production]
08:20	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
08:12	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
08:04	<urbanecm>	[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force	[production]
07:43	<Emperor>	reboot ms-be2045 T290881	[production]
07:41	<gehel>	manually resuming the data reloads on wdqs1009 and wdqs2008	[production]
06:42	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
06:42	<ayounsi@cumin1001>	START - Cookbook sre.network.cf	[production]
06:28	<ayounsi@cumin2002>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
06:28	<ayounsi@cumin2002>	START - Cookbook sre.network.cf	[production]
05:35	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
04:56	<ryankemper>	[WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good	[production]
04:32	<ryankemper>	T292814 Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id T292814` on `ryankemper@cumin1001` tmux `elastic`	[production]
04:31	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
04:29	<ryankemper>	[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`	[production]
04:28	<ryankemper>	[WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
04:28	<ryankemper>	[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
04:23	<ryankemper@deploy1002>	Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)	[production]
04:20	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
04:20	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
04:18	<gehel@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)	[production]
04:17	<gehel@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)	[production]
04:15	<ryankemper>	[WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet	[production]
04:14	<ryankemper@deploy1002>	Started deploy [wdqs/wdqs@8f57a56]: 0.3.89	[production]
04:14	<ryankemper>	[WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`	[production]
03:58	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]