production SAL

351-400 of 10000 results (28ms)

2020-12-16 §
12:20	<jayme>	imported kubernetes 1.16.15-2 into component/kubernetes-future stretch-wikimedia	[production]
11:52	<marostegui>	Stop s1, s3, s5 and s8 on db1124 to copy it to db1154 (this will generate lag on wikireplicas) T268742	[production]
11:19	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE	[production]
11:19	<jiji@deploy1001>	Synchronized wmf-config/ProductionServices.php: Swap mc1019 with mc1031 for Redis lock manager - T265643 (duration: 01m 17s)	[production]
11:17	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE	[production]
11:15	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE	[production]
11:15	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE	[production]
11:14	<jbond@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE	[production]
11:13	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE	[production]
11:10	<jynus>	stopping and restarting dbstore1004 to mitigate (short term) T270112	[production]
10:37	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)	[production]
10:37	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE	[production]
10:35	<jbond42>	reboot rpki2001	[production]
10:35	<jbond@cumin1001>	START - Cookbook sre.hosts.reboot-single	[production]
10:35	<jbond@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE	[production]
10:34	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)	[production]
10:30	<jbond42>	reboot rpki1001	[production]
10:30	<jbond@cumin1001>	START - Cookbook sre.hosts.reboot-single	[production]
10:05	<gehel@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
10:02	<jbond@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE	[production]
10:00	<jbond@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE	[production]
09:49	<godog>	swift eqiad-prod: add weight to ms-be106[0-3] - T268435	[production]
09:32	<_joe_>	reset-failed for docker report jobs on deneb, failed because of a registry gateway timeout	[production]
09:29	<elukey>	force execution of cumin-check-aliases.service on cumin[12]001 hosts to clear alarms	[production]
08:35	<gehel@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
08:23	<vgutierrez>	acme-chief and acme-chief-api restarts for openssl upgrades (CVE-2020-1971)	[production]
07:55	<gehel>	depool wdqs1005 (catching up on lag)	[production]
07:20	<marostegui>	Stop mysql on db2142 to clone db1151 - T269324	[production]
2020-12-15 §
23:47	<dduvall@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
23:45	<dduvall@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
23:34	<dduvall@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .	[production]
22:10	<mholloway-shell@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Promote SessionTick to group1 T248987 (duration: 01m 04s)	[production]
20:29	<marxarelli>	group0 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415)	[production]
20:26	<tzatziki>	reset email for User:Cnk1220	[production]
20:06	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.22	[production]
19:32	<joal@deploy1001>	Finished deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] (duration: 00m 08s)	[production]
19:32	<joal@deploy1001>	Started deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5]	[production]
19:31	<joal@deploy1001>	Finished deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] (duration: 16m 36s)	[production]
19:14	<joal@deploy1001>	Started deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5]	[production]
18:48	<dduvall@deploy1001>	Pruned MediaWiki: 1.36.0-wmf.20 (duration: 04m 19s)	[production]
18:41	<dduvall@deploy1001>	Finished scap: testwikis wikis to 1.36.0-wmf.22 (duration: 46m 41s)	[production]
17:55	<dduvall@deploy1001>	Started scap: testwikis wikis to 1.36.0-wmf.22	[production]
16:47	<ottomata>	bumped eventate-main memory limits from 300M to 600M - T249745	[production]
16:47	<otto@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .	[production]
16:47	<otto@deploy1001>	helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]
16:45	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE	[production]
16:44	<otto@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]
16:44	<otto@deploy1001>	helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .	[production]
16:43	<hnowlan@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE	[production]
16:41	<otto@deploy1001>	helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]