production SAL

101-150 of 10000 results (52ms)

2022-03-01 §
12:31	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker	[production]
12:30	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)	[production]
12:28	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker	[production]
12:15	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)	[production]
12:13	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration	[production]
12:11	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)	[production]
12:09	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration	[production]
11:43	<klausman@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:36	<kharlan@deploy1002>	helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:35	<klausman@cumin2002>	START - Cookbook sre.dns.netbox	[production]
11:35	<klausman@cumin2002>	START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet	[production]
11:33	<kharlan@deploy1002>	helmfile [codfw] START helmfile.d/services/linkrecommendation: apply	[production]
11:32	<kharlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:30	<kharlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply	[production]
11:28	<kharlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply	[production]
11:27	<kharlan@deploy1002>	helmfile [staging] START helmfile.d/services/linkrecommendation: apply	[production]
11:27	<cmooney@cumin1001>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
11:21	<_joe_>	restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843	[production]
11:18	<_joe_>	also removed the ipvsadm entry for apaches:80 T244843	[production]
11:17	<jayme>	rolled back linkrecommendation staging helm release to revision 12 - T302744	[production]
11:17	<_joe_>	restarting pybal on lvs1020 T244843	[production]
11:11	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage	[production]
11:11	<_joe_>	restarted pybal on lvs2009, T244843	[production]
11:09	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage	[production]
11:07	<_joe_>	restarted pybal on lvs2010, T244843	[production]
11:02	<mmandere>	restart purged on cp60[09,10,11]	[production]
11:00	<cmooney@cumin1001>	START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
10:47	<cmooney@cumin1001>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
10:40	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster	[production]
10:40	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts	[production]
10:40	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts	[production]
10:40	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts	[production]
10:39	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts	[production]
10:31	<mmandere>	restart purged on cp600[6-8]	[production]
10:28	<cmooney@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
10:24	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:05	<vgutierrez>	pool cp2039 running HAProxy as TLS termination layer - T290005 T271421	[production]
09:48	<elukey>	elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)	[production]
09:45	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster	[production]
09:33	<_joe_>	restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted	[production]
09:31	<_joe_>	restart pybal on lvs1020	[production]
09:25	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts	[production]
09:25	<elukey>	restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)	[production]
09:25	<_joe_>	manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore	[production]
09:24	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts	[production]
09:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts	[production]
09:22	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts	[production]
09:22	<_joe_>	restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820	[production]
09:20	<_joe_>	restarted pybal on lvs2010	[production]
09:14	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage	[production]