production SAL

4751-4800 of 10000 results (54ms)

2022-03-01 §
13:53	<mmandere>	restart purged on cp60[15-16]	[production]
13:49	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage	[production]
13:48	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
13:48	<klausman@cumin2002>	START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet	[production]
13:48	<klausman@cumin2002>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet	[production]
13:48	<klausman@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
13:47	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage	[production]
13:44	<klausman@cumin2002>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet	[production]
13:43	<klausman@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
13:43	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
13:43	<klausman@cumin2002>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
13:40	<kormat>	Deploying wmfmariadbpy 0.9 T302796	[production]
13:40	<kormat>	uploaded wmfmariadbpy 0.9 to apt.wm.o T302796	[production]
13:39	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
13:39	<klausman@cumin2002>	END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)	[production]
13:39	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
13:39	<klausman@cumin2002>	START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet	[production]
13:39	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
13:39	<klausman@cumin2002>	START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet	[production]
13:32	<moritzm>	restarting nginx on registry* nodes to pick up expat update	[production]
13:31	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster	[production]
13:15	<XioNoX>	restart cr1-drmrs for software upgrade	[production]
13:03	<moritzm>	restarting FPM/Apache on parsoid hosts to pick up expat update	[production]
12:49	<vgutierrez>	pool cp3062 running HAProxy as TLS termination layer - T290005 T271421	[production]
12:47	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster	[production]
12:39	<moritzm>	installing expat security updates	[production]
12:34	<mmandere>	restart purged on cp60[12-14]	[production]
12:32	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)	[production]
12:31	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker	[production]
12:30	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)	[production]
12:28	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker	[production]
12:15	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)	[production]
12:13	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration	[production]
12:11	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)	[production]
12:09	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration	[production]
11:43	<klausman@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:36	<kharlan@deploy1002>	helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:35	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
11:35	<klausman@cumin2002>	START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet	[production]
11:33	<kharlan@deploy1002>	helmfile [codfw] START helmfile.d/services/linkrecommendation: apply	[production]
11:32	<kharlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:30	<kharlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply	[production]
11:28	<kharlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:27	<kharlan@deploy1002>	helmfile [staging] START helmfile.d/services/linkrecommendation: apply	[production]
11:27	<cmooney@cumin1001>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
11:21	<_joe_>	restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843	[production]
11:18	<_joe_>	also removed the ipvsadm entry for apaches:80 T244843	[production]
11:17	<jayme>	rolled back linkrecommendation staging helm release to revision 12 - T302744	[production]
11:17	<_joe_>	restarting pybal on lvs1020 T244843	[production]
11:11	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage	[production]