__all__ SAL

9501-9550 of 10000 results (45ms)

2021-10-11 §
09:05	<volans@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet	[production]
09:01	<godog>	bounce swift-object-replicator on ms-be2036	[production]
08:52	<godog>	bounce statsite on graphite1004 to apply unit config changes	[production]
08:48	<volans@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet	[production]
08:41	<volans@cumin2002>	START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet	[production]
08:38	<moritzm>	updated buster d-i image for Bullseye 11.1 point release T292844	[production]
08:38	<moritzm>	updated buster d-i image for Buster 10.11 point release T292838	[production]
08:26	<godog>	swift eqiad-prod: final weight to ms-be10[64-67] - T290546	[production]
08:25	<moritzm>	updated buster d-i image for Buster 10.11 point release T292838	[production]
08:24	<volans@cumin1001>	START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet	[production]
08:06	<godog>	bounce uwsgi on graphite hosts to bump request size limit - T292877	[production]
07:58	<volans>	migrating physical hosts DHCP to the new reimage process - T269855	[production]
07:57	<elukey>	start kafka topics rebalancing for main-codfw (long running maintenance) - T288825	[production]
07:37	<joal>	rerun refine_event for `event`.`mediawiki_content_translation_event` year=2021/month=10/day=10/hour=16	[analytics]
02:08	<Krinkle>	Browsers fail to connect with https://upload.wikimedia.beta.wmflabs.org/ (Certificate expired two days ago)	[releng]
2021-10-10 §
18:07	<joal>	Rerun webrequest-load-wf-text-2021-10-10-10 - failed due to network issue	[analytics]
11:20	<wm-bot>	<lucaswerkmeister> deployed bf2834c472 (improve error handling)	[tools.lexeme-forms]
2021-10-09 §
05:01	<jiji@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
04:28	<jiji@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
01:32	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
00:46	<mutante>	ms-be2045 - started systemd-timedated which had been killed by something	[production]
00:28	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
00:24	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)	[production]
00:23	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.force-unfreeze	[production]
00:18	<DeusExMachina>	Deleted mars-01 per T292884	[wikisp]
00:13	<ryankemper>	T292814 Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time	[production]
00:12	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
2021-10-08 §
23:16	<legoktm>	sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'	[production]
23:10	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814	[production]
23:00	<wm-bot>	<root> root forced the deletion of job 913897	[tools.gerrit-reviewer-bot]
22:55	<wm-bot>	<root> Restarting in the hope of fixing LE cert issues	[tools.gerrit-reviewer-bot]
21:38	<mutante>	mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress	[production]
21:34	<mutante>	disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)	[production]
21:30	<legoktm>	running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'	[production]
21:17	<hashar>	Purging Docker images on all CI agents	[releng]
20:12	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
20:10	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE	[production]
20:08	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
20:08	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE	[production]
20:06	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE	[production]
20:05	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE	[production]
20:00	<hashar>	Updating Jenkins jobs for Quibble 1.1.1 # https://gerrit.wikimedia.org/r/c/integration/config/+/728620	[releng]
19:46	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE	[production]
19:45	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE	[production]
19:43	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE	[production]
19:42	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE	[production]
19:42	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
19:39	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE	[production]
19:23	<hashar>	Building CI Docker images for Quibble 1.1.1	[releng]
19:08	<hashar>	Tag quibble 1.1.1 @ b54af2aa60	[releng]