production SAL

901-950 of 10000 results (83ms)

2023-06-27 §
08:58	<hnowlan@puppetmaster1001>	conftool action : set/weight=10; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet	[production]
08:58	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet	[production]
08:58	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet	[production]
08:53	<akosiaris@deploy1002>	Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 21s)	[production]
08:52	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply	[production]
08:47	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/machinetranslation: apply	[production]
08:45	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/machinetranslation: apply	[production]
08:42	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/machinetranslation: apply	[production]
08:42	<fabfur@cumin1001>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo	[production]
08:41	<fabfur@cumin1001>	START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo	[production]
08:41	<kart_>	Updated cxserver to 2023-06-27-053435-production (T339105)	[production]
08:38	<elukey>	revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private - T337825	[production]
08:33	<root@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 19 hosts	[production]
08:33	<root@cumin2002>	START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 19 hosts	[production]
08:32	<kartik@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cxserver: apply	[production]
08:32	<root@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 767 hosts	[production]
08:32	<kartik@deploy1002>	helmfile [eqiad] START helmfile.d/services/cxserver: apply	[production]
08:32	<root@cumin2002>	START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts	[production]
08:31	<root@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts	[production]
08:30	<root@cumin2002>	START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts	[production]
08:29	<marostegui>	Failover m2-master to dbproxy1022 T337812	[production]
08:28	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/cxserver: apply	[production]
08:28	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/cxserver: apply	[production]
08:25	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/cxserver: apply	[production]
08:24	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/cxserver: apply	[production]
08:14	<kartik@deploy1002>	Finished scap: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]] (duration: 16m 17s)	[production]
08:03	<moritzm>	installing openjdk-8 security updates for bullseye	[production]
08:02	<kartik@deploy1002>	kartik: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
07:58	<kartik@deploy1002>	Started scap: Backport for [[gerrit:933125\|Enable Content and Section Translation for 4 Wikipedias (T338123)]]	[production]
07:54	<moritzm>	uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster)	[production]
07:48	<hashar>	Restart Zuul due to stuck connection \| T340518 \| T309376	[production]
07:15	<elukey>	`sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet	[production]
06:22	<marostegui>	Failover m1-master to dbproxy1022 T337812	[production]
2023-06-26 §
23:21	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:21	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery	[production]
23:07	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
23:02	<sbassett>	Deployed updated mitigation for T336027	[production]
23:01	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)	[production]
22:55	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:51	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
22:46	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:33	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:31	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:24	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
22:18	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)	[production]
22:17	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.restart	[production]
22:17	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
22:16	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)	[production]
22:05	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]