production SAL

3751-3800 of 10000 results (50ms)

2022-03-01 §
11:21	<_joe_>	restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843	[production]
11:18	<_joe_>	also removed the ipvsadm entry for apaches:80 T244843	[production]
11:17	<jayme>	rolled back linkrecommendation staging helm release to revision 12 - T302744	[production]
11:17	<_joe_>	restarting pybal on lvs1020 T244843	[production]
11:11	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage	[production]
11:11	<_joe_>	restarted pybal on lvs2009, T244843	[production]
11:09	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage	[production]
11:07	<_joe_>	restarted pybal on lvs2010, T244843	[production]
11:02	<mmandere>	restart purged on cp60[09,10,11]	[production]
11:00	<cmooney@cumin1001>	START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
10:47	<cmooney@cumin1001>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
10:40	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster	[production]
10:40	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts	[production]
10:40	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts	[production]
10:40	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts	[production]
10:39	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts	[production]
10:31	<mmandere>	restart purged on cp600[6-8]	[production]
10:28	<cmooney@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
10:24	<cmooney@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:05	<vgutierrez>	pool cp2039 running HAProxy as TLS termination layer - T290005 T271421	[production]
09:48	<elukey>	elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)	[production]
09:45	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster	[production]
09:33	<_joe_>	restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted	[production]
09:31	<_joe_>	restart pybal on lvs1020	[production]
09:25	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts	[production]
09:25	<elukey>	restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)	[production]
09:25	<_joe_>	manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore	[production]
09:24	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts	[production]
09:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts	[production]
09:22	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts	[production]
09:22	<_joe_>	restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820	[production]
09:20	<_joe_>	restarted pybal on lvs2010	[production]
09:14	<vgutierrez@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage	[production]
09:12	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage	[production]
09:06	<elukey>	restart purged on cp6005	[production]
08:57	<elukey>	restart purged on cp6004	[production]
08:54	<vgutierrez@cumin1001>	START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster	[production]
08:27	<urbanecm>	UTC morning B&C window done	[production]
08:25	<elukey>	restart purged on cp6003	[production]
08:16	<moritzm>	drain instances off ganeti2008 for eventual decom	[production]
08:08	<urbanecm@deploy1002>	Synchronized wmf-config/ProductionServices.php: d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s)	[production]
07:59	<elukey>	restart purged on cp6002	[production]
06:58	<oblivian@deploy1002>	Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s)	[production]
06:57	<oblivian@deploy1002>	Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test	[production]
06:56	<elukey>	restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)	[production]
06:46	<_joe_>	uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464	[production]
06:46	<_joe_>	uploaded scap 4.4.1 to {stretch,buster,bullseye}	[production]
02:59	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json	[production]
02:44	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json	[production]
02:29	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json	[production]