production SAL

6501-6550 of 10000 results (19ms)

2021-03-22 §
10:47	<elukey@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
10:34	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
10:33	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
10:32	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
10:32	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
10:27	<elukey@deploy1002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:26	<elukey@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
10:26	<elukey@deploy1002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:25	<elukey@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
10:17	<elukey@deploy1002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:17	<elukey@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
10:15	<elukey@deploy1002>	helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.	[production]
10:15	<elukey@deploy1002>	helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.	[production]
10:12	<elukey>	run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055	[production]
07:51	<elukey>	stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - T273865	[production]
2021-03-21 §
09:22	<elukey>	install apache2-bin-dbgsym on gerrit1001 - T277127	[production]
2021-03-19 §
11:20	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE	[production]
11:18	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE	[production]
08:22	<elukey>	upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia	[production]
2021-03-18 §
13:49	<elukey>	reboot analytics1066	[production]
2021-03-15 §
07:22	<elukey>	powercycle ms-be1038 - no ssh, no tty available in mgmt serial console, irrecoverable error saved in ilo's system logs	[production]
2021-03-10 §
07:07	<elukey>	sudo apt-get remove linux-image-4.9.0-9-amd64 on sodium to free space for /boot	[production]
07:01	<elukey>	remove the oldest kernel on ganeti nodes to free space for /boot	[production]
06:45	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE	[production]
06:43	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE	[production]
06:17	<elukey>	reimage an-worker1111 to buster	[production]
2021-03-09 §
20:42	<elukey>	reimaged an-worker1091 to buster	[production]
20:25	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE	[production]
20:23	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE	[production]
18:52	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE	[production]
18:49	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE	[production]
18:26	<elukey>	reimage an-worker1087 to buster	[production]
18:02	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE	[production]
18:00	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE	[production]
17:01	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE	[production]
16:59	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE	[production]
16:40	<elukey>	reimage analytics1077 to buster	[production]
15:45	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE	[production]
15:43	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE	[production]
15:18	<elukey>	reimage analytics1072 (hadoop hdfs journal node) to buster	[production]
14:56	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE	[production]
14:54	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE	[production]
14:53	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE	[production]
14:52	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE	[production]
14:29	<elukey>	drain + reimage an-worker1090/89 to Buster	[production]
13:52	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE	[production]
13:50	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE	[production]
13:49	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE	[production]
13:49	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE	[production]
13:27	<elukey>	reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster	[production]