production SAL

1701-1750 of 10000 results (82ms)

2023-09-28 §
16:41	<fabfur@cumin1001>	END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}	[production]
16:41	<fabfur@cumin1001>	START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}	[production]
16:41	<jgiannelos@deploy2002>	helmfile [staging] DONE helmfile.d/services/wikifeeds: apply	[production]
16:41	<jgiannelos@deploy2002>	helmfile [staging] START helmfile.d/services/wikifeeds: apply	[production]
16:39	<jgiannelos@deploy2002>	helmfile [staging] DONE helmfile.d/services/wikifeeds: apply	[production]
16:39	<jgiannelos@deploy2002>	helmfile [staging] START helmfile.d/services/wikifeeds: apply	[production]
16:37	<btullis@cumin1001>	START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet	[production]
16:35	<fabfur@cumin1001>	END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw	[production]
16:35	<fabfur@cumin1001>	END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw	[production]
16:26	<joal@deploy2002>	helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply	[production]
16:26	<joal@deploy2002>	helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply	[production]
16:23	<fabfur@cumin1001>	START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw	[production]
16:23	<fabfur@cumin1001>	START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw	[production]
16:14	<hnowlan>	enabling puppet on A:cp, routing mediarequests API via rest-gateway	[production]
16:03	<hnowlan>	disabled puppet on A:cp	[production]
15:54	<hnowlan@deploy1002>	helmfile [codfw] DONE helmfile.d/services/media-analytics: apply	[production]
15:53	<hnowlan@deploy1002>	helmfile [codfw] START helmfile.d/services/media-analytics: apply	[production]
15:49	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply	[production]
15:49	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/media-analytics: apply	[production]
15:49	<hnowlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/media-analytics: apply	[production]
15:48	<hnowlan@deploy1002>	helmfile [staging] START helmfile.d/services/media-analytics: apply	[production]
15:48	<brennen@deploy2002>	Sync cancelled.	[production]
15:48	<fabfur@cumin1001>	END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_ulsfo	[production]
15:47	<brennen@deploy2002>	brennen: Backport for [[gerrit:961719\|Revert "NostalgiaTemplate.php: Fix array illegal offset error"]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
15:47	<hnowlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/rest-gateway: apply	[production]
15:47	<hnowlan@deploy1002>	helmfile [staging] START helmfile.d/services/rest-gateway: apply	[production]
15:46	<brennen@deploy2002>	Started scap: Backport for [[gerrit:961719\|Revert "NostalgiaTemplate.php: Fix array illegal offset error"]]	[production]
15:39	<brennen@deploy2002>	Sync cancelled.	[production]
15:38	<brennen@deploy2002>	krinkle and brennen: Backport for [[gerrit:961717\|NostalgiaTemplate.php: Fix array illegal offset error]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
15:36	<brennen@deploy2002>	Started scap: Backport for [[gerrit:961717\|NostalgiaTemplate.php: Fix array illegal offset error]]	[production]
15:27	<fabfur@cumin1001>	END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-upload_ulsfo and not P{cp4052*}	[production]
15:06	<tchin@deploy2002>	helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply	[production]
15:05	<tchin@deploy2002>	helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply	[production]
15:03	<jhancock@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye	[production]
14:49	<inflatador>	bking@wdqs1016 shutting down services to compress a 1.2 TB jnl file	[production]
14:43	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52725 and previous config saved to /var/cache/conftool/dbconfig/20230928-144338-root.json	[production]
14:35	<moritzm>	installing ghostscript security updates	[production]
14:32	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression	[production]
14:32	<bking@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression	[production]
14:13	<klausman>	restarting pybal on lvs1019 and lvs2013 (LVS low-traffic actives) for T347278 (ORES turndown)	[production]
14:11	<arnaudb@cumin1001>	dbctl commit (dc=all): 'Depooling db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52723 and previous config saved to /var/cache/conftool/dbconfig/20230928-141140-arnaudb.json	[production]
14:11	<arnaudb@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance	[production]
14:11	<arnaudb@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance	[production]
14:11	<arnaudb@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52722 and previous config saved to /var/cache/conftool/dbconfig/20230928-141118-arnaudb.json	[production]
14:08	<cdanis>	repooling cp5030 after haproxy upgrade & config deploy T317799	[production]
14:02	<jhancock@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1228.eqiad.wmnet with OS bullseye	[production]
14:02	<jhancock@cumin2002>	START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye	[production]
14:02	<cdanis>	depooling cp5030 for haproxy upgrade & testing T317799	[production]
14:01	<moritzm>	installing gsl security updates	[production]
14:00	<klausman>	restarted pybal on lvs1020 and lvs2014 (LVS low-traffic backups) for T347278 (ORES turndown)	[production]