production SAL

3851-3900 of 10000 results (86ms)

2023-02-07 §
11:37	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage	[production]
11:33	<moritzm>	installing imagemagick security updates on buster	[production]
11:29	<jiji@cumin1001>	START - Cookbook sre.hosts.reimage for host mc1041.eqiad.wmnet with OS bullseye	[production]
11:21	<jiji@cumin1001>	START - Cookbook sre.hosts.reimage for host mc2044.codfw.wmnet with OS bullseye	[production]
10:51	<elukey@cumin1001>	START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.	[production]
10:49	<elukey@cumin1001>	END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.	[production]
10:19	<oblivian@cumin2002>	END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in eqiad: Pooling eqiad for codfw depool today	[production]
10:19	<oblivian@cumin2002>	START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today	[production]
10:17	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bullseye	[production]
10:13	<oblivian@cumin2002>	END (FAIL) - Cookbook sre.discovery.datacenter-route (exit_code=93) pool all active/active services in eqiad: Pooling eqiad for codfw depool today	[production]
10:12	<oblivian@cumin2002>	START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today	[production]
10:01	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage	[production]
09:56	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage	[production]
09:44	<jmm@cumin2002>	START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bullseye	[production]
09:42	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2002.wikimedia.org with OS bullseye	[production]
09:24	<akosiaris@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/changeprop: sync	[production]
09:23	<akosiaris@deploy1002>	helmfile [eqiad] START helmfile.d/services/changeprop: sync	[production]
09:22	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: host reimage	[production]
09:20	<akosiaris@deploy1002>	helmfile [codfw] DONE helmfile.d/services/changeprop: sync	[production]
09:20	<akosiaris@deploy1002>	helmfile [codfw] START helmfile.d/services/changeprop: sync	[production]
09:20	<akosiaris>	add wiktionary to mobile-sections rerenders. T226931	[production]
09:19	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: host reimage	[production]
09:19	<akosiaris@deploy1002>	helmfile [staging] DONE helmfile.d/services/changeprop: sync	[production]
09:19	<akosiaris@deploy1002>	helmfile [staging] START helmfile.d/services/changeprop: sync	[production]
09:08	<elukey@cumin1001>	START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.	[production]
09:02	<jmm@cumin2002>	START - Cookbook sre.hosts.reimage for host bast2002.wikimedia.org with OS bullseye	[production]
08:50	<vgutierrez>	rolling upgrade to HAProxy 2.4.21 in cp nodes	[production]
08:48	<kostajh>	UTC morning deploys done	[production]
08:48	<kharlan@deploy1002>	Finished scap: Backport for [[gerrit:883236\|[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153\|Remove GEMentorProvider (T321501)]] (duration: 12m 48s)	[production]
08:37	<kharlan@deploy1002>	urbanecm and kharlan: Backport for [[gerrit:883236\|[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153\|Remove GEMentorProvider (T321501)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
08:35	<kharlan@deploy1002>	Started scap: Backport for [[gerrit:883236\|[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153\|Remove GEMentorProvider (T321501)]]	[production]
08:30	<moritzm>	installing imagemagick security updates on Thumbor T328901	[production]
08:28	<kharlan@deploy1002>	Finished scap: Backport for [[gerrit:886343\|GrowthExperiments: Disable leveling up features in production (T328757)]] (duration: 12m 11s)	[production]
08:18	<kharlan@deploy1002>	kharlan: Backport for [[gerrit:886343\|GrowthExperiments: Disable leveling up features in production (T328757)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet	[production]
08:16	<kharlan@deploy1002>	Started scap: Backport for [[gerrit:886343\|GrowthExperiments: Disable leveling up features in production (T328757)]]	[production]
08:14	<kharlan@deploy1002>	backport aborted: (duration: 00m 07s)	[production]
07:00	<marostegui>	Failover m3 from db1159 to db1164 - T328404	[production]
06:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db2110 in API', diff saved to https://phabricator.wikimedia.org/P43758 and previous config saved to /var/cache/conftool/dbconfig/20230207-063147-root.json	[production]
06:28	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P43757 and previous config saved to /var/cache/conftool/dbconfig/20230207-062826-root.json	[production]
04:58	<mwpresync@deploy1002>	Pruned MediaWiki: 1.40.0-wmf.20 (duration: 02m 20s)	[production]
04:55	<mwpresync@deploy1002>	Finished scap: testwikis wikis to 1.40.0-wmf.22 refs T325585 (duration: 53m 11s)	[production]
04:02	<mwpresync@deploy1002>	Started scap: testwikis wikis to 1.40.0-wmf.22 refs T325585	[production]
2023-02-06 §
23:17	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED	[production]
23:01	<pt1979@cumin2002>	START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED	[production]
22:55	<ryankemper>	T327925 Depooled codfw wdqs hosts: `ryankemper@cumin2002:~$ sudo -E cumin -b 3 'wdqs[2003-2004,2009]*' 'sudo depool'`	[production]
22:51	<bking@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade	[production]
22:51	<bking@cumin2002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade	[production]
22:48	<ryankemper>	T327925 Banned `elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]` on codfw elastic	[production]
22:42	<inflatador>	bking@cumin2002 banning Elastic nodes from cluster in preparation for T327925	[production]
22:17	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED	[production]