production SAL

5001-5050 of 10000 results (21ms)

2022-07-20 §
08:14	<elukey>	apt-get clean on archiva1002 to free some space	[production]
2022-07-19 §
10:05	<elukey>	reboot an-worker1127 - hdfs datanode caused CPU stalls	[production]
2022-07-18 §
08:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
2022-07-14 §
11:22	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
2022-07-13 §
15:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
15:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
15:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
2022-07-11 §
13:50	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .	[production]
13:49	<elukey@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
13:48	<elukey@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
2022-07-07 §
13:21	<elukey@deploy1002>	helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync	[production]
13:20	<elukey@deploy1002>	helmfile [codfw] START helmfile.d/services/eventgate-main: sync	[production]
13:20	<elukey@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync	[production]
13:19	<elukey@deploy1002>	helmfile [eqiad] START helmfile.d/services/eventgate-main: sync	[production]
13:19	<elukey>	roll restart eventgate-main pods to add a new stream - T301878	[production]
2022-07-06 §
08:26	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:25	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1033.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:25	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:23	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1032.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:21	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:20	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1031.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:16	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:14	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1030.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:09	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:07	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1029.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:02	<elukey@cumin1001>	END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
08:01	<elukey@cumin1001>	START - Cookbook sre.puppet.renew-cert for ms-be1028.eqiad.wmnet: Renew puppet certificate - elukey@cumin1001	[production]
2022-07-05 §
08:43	<elukey@deploy1002>	helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .	[production]
2022-07-04 §
14:19	<elukey>	roll restart of thanos-fe's proxy to pick up a new account - T311628	[production]
08:04	<elukey>	kill leftover processes of user `mewoph` on stat100x to allow puppet runs	[production]
2022-06-30 §
09:47	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS buster	[production]
09:36	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS buster	[production]
08:57	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS buster	[production]
08:56	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage	[production]
08:56	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage	[production]
08:44	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage	[production]
08:42	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS buster	[production]
08:42	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage	[production]
08:33	<elukey@deploy1002>	Finished deploy [ores/deploy@dfaec93]: Update ores submodule to its latest commit and scap canary settings (duration: 14m 48s)	[production]
08:28	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage	[production]
08:28	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS buster	[production]
08:26	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage	[production]
08:19	<elukey@deploy1002>	Started deploy [ores/deploy@dfaec93]: Update ores submodule to its latest commit and scap canary settings	[production]
08:12	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS buster	[production]
2022-06-29 §
07:54	<elukey@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
07:54	<elukey@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
2022-06-27 §
16:25	<elukey>	upload cassandra-tools-wmf 1.1.0-2 (py3 version) to bullseye-wikimedia - T310980	[production]
09:57	<elukey>	copy cassandra and cassandra-tools packages in component/cassandra{311,dev} from wikimedia buster to bullseye - T310980	[production]
2022-06-25 §
13:16	<elukey>	restart rsyslog on ml-staging-ctrl200[1,2] - broken connections to centrallog	[production]
09:54	<elukey>	restart rsyslog on ml-serve-ctrl200[1,2] - broken connections to centrallog	[production]