production SAL

7001-7050 of 10000 results (105ms)

2023-12-11 §
12:05	<urbanecm@deploy2002>	urbanecm: Continuing with sync	[production]
12:05	<urbanecm@deploy2002>	urbanecm: Backport for [[gerrit:981734\|Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
12:03	<urbanecm@deploy2002>	Started scap: Backport for [[gerrit:981734\|Revert "Growth: Enable Welcome survey user research for ar/en/es" (T351266)]]	[production]
11:20	<vgutierrez>	rolling restart of pybal on lvs3010 and lvs3008 effectively enabling IPIP encapsulation on ncredir@esams - T351069	[production]
11:18	<claime>	sudo confctl --object-type discovery select 'name=eqiad,dnsdisc=k8s-ingress-dse' set/pooled=true - T352639	[production]
11:16	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance	[production]
11:16	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance	[production]
11:12	<brouberol>	Add discovery records for the k8s-ingress-dse LVS service - T352639	[production]
10:55	<dcausse>	(properly) restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)	[production]
10:54	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)	[production]
10:50	<cgoubert@cumin1001>	START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[1019-1020].eqiad.wmnet} and A:lvs (T352639)	[production]
10:46	<claime>	Running puppet on O:lvs::balancer - T352639	[production]
10:45	<claime>	Disabling puppet on O:lvs::balancer - T352639	[production]
10:42	<elukey@deploy2002>	helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync	[production]
10:42	<elukey@deploy2002>	helmfile [codfw] START helmfile.d/services/recommendation-api: sync	[production]
10:42	<elukey@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync	[production]
10:38	<elukey@deploy2002>	helmfile [eqiad] START helmfile.d/services/recommendation-api: sync	[production]
10:38	<elukey@deploy2002>	helmfile [staging] DONE helmfile.d/services/recommendation-api: sync	[production]
10:38	<elukey@deploy2002>	helmfile [staging] START helmfile.d/services/recommendation-api: sync	[production]
10:37	<claime>	Repooling dse-k8s-worker nodes - sudo confctl select "service=kubesvc,cluster=dse-k8s" set/pooled=yes - T352639	[production]
10:03	<jayme>	removed cergen certs of all k8s servies from private puppet in commit d36a97aa23e21824f95d22264d06e2c3bf3c6ac3 - T300033	[production]
09:57	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38753	[production]
09:56	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'email' for AS: 38753	[production]
09:55	<elukey@deploy2002>	helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync	[production]
09:55	<elukey@deploy2002>	helmfile [codfw] START helmfile.d/services/recommendation-api: sync	[production]
09:54	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1547	[production]
09:54	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'email' for AS: 1547	[production]
09:50	<elukey@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync	[production]
09:50	<elukey@deploy2002>	helmfile [eqiad] START helmfile.d/services/recommendation-api: sync	[production]
09:44	<elukey@deploy2002>	helmfile [staging] DONE helmfile.d/services/recommendation-api: sync	[production]
09:44	<elukey@deploy2002>	helmfile [staging] START helmfile.d/services/recommendation-api: sync	[production]
08:43	<kostajh>	UTC morning deploys done	[production]
08:43	<kharlan@deploy2002>	Finished scap: Backport for [[gerrit:976252\|ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366)]], [[gerrit:981424\|IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)]] (duration: 22m 02s)	[production]
08:40	<dcausse>	restarted blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)	[production]
08:36	<kharlan@deploy2002>	kharlan and d3r1ck01: Continuing with sync	[production]
08:22	<kharlan@deploy2002>	kharlan and d3r1ck01: Backport for [[gerrit:976252\|ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366)]], [[gerrit:981424\|IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
08:21	<kharlan@deploy2002>	Started scap: Backport for [[gerrit:976252\|ClusterConfig: Rename `isTest()` to `isDebug()` for consistency (T347366)]], [[gerrit:981424\|IPInfo: Add comment clarifying $wgIPInfoGeoIP2EnterprisePath (T304604)]]	[production]
08:16	<kharlan@deploy2002>	Finished scap: Backport for [[gerrit:979969\|MediaModeration: Set MediaModerationDeveloperMode to false]] (duration: 09m 55s)	[production]
08:15	<arnaudb@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade	[production]
08:15	<arnaudb@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: reboot for upgrade	[production]
08:09	<kharlan@deploy2002>	kharlan: Continuing with sync	[production]
08:07	<kharlan@deploy2002>	kharlan: Backport for [[gerrit:979969\|MediaModeration: Set MediaModerationDeveloperMode to false]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
08:06	<kharlan@deploy2002>	Started scap: Backport for [[gerrit:979969\|MediaModeration: Set MediaModerationDeveloperMode to false]]	[production]
07:53	<arnaudb@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade	[production]
07:53	<arnaudb@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: reboot for upgrade	[production]
07:31	<arnaudb@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade	[production]
07:31	<arnaudb@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade	[production]
07:24	<arnaudb@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade	[production]
07:24	<arnaudb@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2185.codfw.wmnet with reason: reboot for upgrade	[production]
07:12	<marostegui>	Failvoer m3-master from dbproxy1020 to dbproxy1026 T351864	[production]