production SAL

3401-3450 of 10000 results (31ms)

2020-10-13 §
21:16	<mutante>	icinga had gerrit health alert but did not notice an issue myself and was gone next check	[production]
21:12	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
21:12	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
21:09	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
21:07	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
20:44	<mutante>	bast1002 - apt-get autoremove - cleans up golang and ruby packages	[production]
20:44	<mutante>	bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)	[production]
20:15	<ebernhardson>	unban elastic2029 from production-search-psi-codfw	[production]
20:14	<ebernhardson>	restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell	[production]
20:06	<marxarelli>	1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates (T263179)	[production]
20:03	<ebernhardson>	add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell	[production]
19:54	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13	[production]
19:52	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
19:49	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
19:40	<dduvall@deploy1001>	Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)	[production]
19:00	<dduvall@deploy1001>	Started scap: testwikis wikis to 1.36.0-wmf.13	[production]
18:58	<dduvall@deploy1001>	Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)	[production]
18:56	<dduvall@deploy1001>	Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)	[production]
18:53	<dduvall@deploy1001>	Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)	[production]
18:23	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11	[production]
18:21	<marxarelli>	1.36.0-wmf.11 promoted to group1. no new errors (T263177). promoting to all wikis	[production]
18:10	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
18:09	<robh>	scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw T238036	[production]
18:08	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
18:01	<robh>	scs-c1-codfw firmware update via T238036	[production]
17:47	<marxarelli>	1.36.0-wmf.13 branched at a6be801fc6331a6a6b96f02f368750200d50ab09 for T263179	[production]
17:35	<dduvall@deploy1001>	Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)	[production]
17:34	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11	[production]
17:32	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
17:32	<jbond@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
17:30	<marxarelli>	1.36.0-wmf.11 promoted to group0. no new errors (T263177). preparing to promote to group1	[production]
17:18	<ppchelko@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .	[production]
17:18	<ppchelko@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .	[production]
17:17	<ppchelko@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .	[production]
17:16	<ppchelko@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .	[production]
17:15	<ppchelko@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .	[production]
17:15	<ppchelko@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .	[production]
16:39	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11	[production]
16:31	<ebernhardson@deploy1001>	Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)	[production]
16:26	<ebernhardson@deploy1001>	Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc	[production]
15:56	<papaul>	power down ms-be2036 for maintenance	[production]
15:02	<godog>	bounce logstash on logstash1007, GC death	[production]
14:41	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
14:39	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
14:18	<urbanecm@deploy1001>	Synchronized wmf-config/CommonSettings.php: 5b28fd685b9cb8d8e93650b5d02bc41b81d0883c: Add setmentor to wgAvailableRights (duration: 00m 59s)	[production]
13:42	<jayme@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .	[production]
13:40	<jayme@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .	[production]
13:15	<Urbanecm>	[urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # T265336	[production]
13:08	<moritzm>	imported php-mailparse, php-mongodb, php-msgpack to component/icu63 T264991	[production]
12:50	<Urbanecm>	urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # T265336	[production]