production SAL

5401-5450 of 10000 results (98ms)

2023-04-04 §
14:36	<jgiannelos@deploy2002>	helmfile [staging] START helmfile.d/services/wikifeeds: apply	[production]
14:28	<vgutierrez>	switch cp6008 (upload) and cp6016 (text) to use a single UDS socket between haproxy and varnish - T333965	[production]
14:21	<jynus>	stop es1022 for debugging T333961	[production]
14:15	<Lucas_WMDE>	UTC afternoon backport+config window done	[production]
14:15	<lucaswerkmeister-wmde@deploy2002>	Finished scap: Backport for [[gerrit:905598\|Use HookContainer to register hooks inside hooks (T333926)]] (duration: 10m 50s)	[production]
14:10	<stevemunene@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=aqs1018.eqiad.wmnet	[production]
14:09	<stevemunene@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=aqs1013.eqiad.wmnet	[production]
14:09	<stevemunene@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=aqs1012.eqiad.wmnet	[production]
14:09	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33	[production]
14:09	<ayounsi@cumin1001>	START - Cookbook sre.network.debug for Netbox circuit ID 33	[production]
14:09	<stevemunene@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=datahubsearch1003.eqiad.wmnet	[production]
14:05	<lucaswerkmeister-wmde@deploy2002>	lucaswerkmeister-wmde: Backport for [[gerrit:905598\|Use HookContainer to register hooks inside hooks (T333926)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet	[production]
14:04	<lucaswerkmeister-wmde@deploy2002>	Started scap: Backport for [[gerrit:905598\|Use HookContainer to register hooks inside hooks (T333926)]]	[production]
13:44	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depool es1022 T333961', diff saved to https://phabricator.wikimedia.org/P46027 and previous config saved to /var/cache/conftool/dbconfig/20230404-134415-ladsgroup.json	[production]
13:42	<Emperor>	repool thanos-fe1003 re T331882	[production]
13:41	<Emperor>	repool ms-fe1011 re T331882	[production]
13:38	<steve_munene>	leave hdfs safemode T331882	[production]
13:38	<inflatador>	reboot elastic2038 to clear soft lock	[production]
13:34	<sukhe>	run authdns-update for CR 905612, reverting depool of eqiad	[production]
13:30	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet	[production]
13:25	<cgoubert@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/mw-web: apply	[production]
13:25	<cgoubert@deploy2002>	helmfile [eqiad] START helmfile.d/services/mw-web: apply	[production]
13:13	<hnowlan@puppetmaster1001>	conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet	[production]
13:11	<hnowlan@puppetmaster1001>	conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet	[production]
13:11	<XioNoX>	asw2-c-eqiad> request system reboot all-members - T331882	[production]
13:10	<urbanecm@deploy2002>	Finished scap: Backport for [[gerrit:905544\|ckbwiktionary: Add logo (T331831)]] (duration: 07m 00s)	[production]
13:05	<akosiaris@cumin1001>	END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row C switches upgrade - T331882	[production]
13:03	<urbanecm@deploy2002>	Started scap: Backport for [[gerrit:905544\|ckbwiktionary: Add logo (T331831)]]	[production]
13:02	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 227 hosts with reason: eqiad row C upgrade	[production]
12:57	<ayounsi@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 227 hosts with reason: eqiad row C upgrade	[production]
12:57	<steve_munene>	putting pdfs into safe mode as part of T331882	[production]
12:52	<ayounsi@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 228 hosts with reason: eqiad row C upgrade	[production]
12:52	<ayounsi@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 228 hosts with reason: eqiad row C upgrade	[production]
12:44	<akosiaris@cumin1001>	START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row C switches upgrade - T331882	[production]
12:43	<Emperor>	depool thanos-fe1003 re T331882	[production]
12:38	<Emperor>	depool ms-fe1011 re T331882	[production]
12:32	<sukhe>	[finished] run authdns-update for CR: 905603 depool eqiad	[production]
12:31	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 38 hosts with reason: Row c switch maint T331882	[production]
12:31	<sukhe>	run authdns-update for CR: 905603 depool eqiad	[production]
12:31	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on 38 hosts with reason: Row c switch maint T331882	[production]
12:28	<stevemunene@puppetmaster1001>	conftool action : set/pooled=no; selector: name=aqs1018.eqiad.wmnet	[production]
12:28	<volans@cumin1001>	END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox	[production]
12:28	<stevemunene@puppetmaster1001>	conftool action : set/pooled=no; selector: name=aqs1013.eqiad.wmnet	[production]
12:28	<volans@cumin1001>	START - Cookbook sre.netbox.update-extras rolling update on A:netbox	[production]
12:28	<stevemunene@puppetmaster1001>	conftool action : set/pooled=no; selector: name=aqs1012.eqiad.wmnet	[production]
12:28	<volans@cumin1001>	END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary	[production]
12:27	<volans@cumin1001>	START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary	[production]
12:26	<stevemunene@puppetmaster1001>	conftool action : set/pooled=no; selector: name=datahubsearch1003.eqiad.wmnet	[production]
12:24	<TimStarling>	I noticed that mw2382 was still talking to mwlog1002. It still had old php-fpm7.4 processes despite the scap. So I manually restarted php-fpm on it.	[production]
12:17	<tstarling@deploy2002>	Synchronized src/Profiler.php: T331882 disable profiling for switch maintenance (duration: 05m 58s)	[production]