production SAL

401-450 of 10000 results (53ms)

2022-07-13 §
14:34	<aqu@deploy1002>	Started deploy [airflow-dags/analytics_test@03c1a05]: Deploy [airflow-dags/analytics_test@03c1a05]	[production]
14:18	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[production]
14:11	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye	[production]
14:08	<aqu@deploy1002>	Finished deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67] (duration: 07m 42s)	[production]
14:04	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye	[production]
14:01	<aqu@deploy1002>	Started deploy [analytics/refinery@bd39e67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bd39e67]	[production]
14:00	<aqu@deploy1002>	Finished deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67] (duration: 00m 07s)	[production]
14:00	<aqu@deploy1002>	Started deploy [analytics/refinery@bd39e67] (thin): Regular analytics weekly train THIN [analytics/refinery@bd39e67]	[production]
13:47	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2049.codfw.wmnet with OS bullseye	[production]
13:44	<marostegui@cumin1001>	dbctl commit (dc=all): 'Remove weight from x1 master', diff saved to https://phabricator.wikimedia.org/P31037 and previous config saved to /var/cache/conftool/dbconfig/20220713-134413-marostegui.json	[production]
13:37	<bking@cumin1001>	START - Cookbook sre.hosts.reimage for host elastic2049.codfw.wmnet with OS bullseye	[production]
13:20	<Lucas_WMDE>	UTC afternoon backport window done	[production]
13:20	<bking@cumin1001>	END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host elastic2049.codfw.wmnet	[production]
13:18	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
13:17	<lucaswerkmeister-wmde@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:790399\|Configure wgLexemeLexicalCategoryItemIds on Wikidata (T307441)]] (duration: 02m 45s)	[production]
13:17	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
13:17	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
13:16	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
13:10	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
13:10	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
13:10	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
13:09	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
13:08	<lucaswerkmeister-wmde@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:813594\|Configure $wgBabelCategoryNames on Test Wikidata (T312920)]] (duration: 02m 51s)	[production]
13:05	<inflatador>	bking@elastic2049 rebooting for read-only fs	[production]
13:04	<bking@cumin1001>	START - Cookbook sre.hosts.reboot-single for host elastic2049.codfw.wmnet	[production]
12:49	<damilare>	payments-wiki upgraded from 2f95d8b4 to 6a8aa302	[production]
12:12	<moritzm>	draining ganeti2028 T311686	[production]
12:08	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686	[production]
12:08	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2018.codfw.wmnet with reason: Remove node for eventual reimage, T311686	[production]
11:43	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch	[production]
11:43	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: codfw s8 sanitarium master switch	[production]
10:42	<aqu@deploy1002>	Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67] (duration: 04m 52s)	[production]
10:38	<aqu@deploy1002>	Started deploy [analytics/refinery@bd39e67]: Regular analytics weekly train (2nd try. --force) [analytics/refinery@bd39e67]	[production]
10:27	<moritzm>	draining ganeti1028 T311686	[production]
10:23	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686	[production]
10:23	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2012.codfw.wmnet with reason: Remove node for eventual reimage, T311686	[production]
09:07	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31035 and previous config saved to /var/cache/conftool/dbconfig/20220713-090748-ladsgroup.json	[production]
08:52	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31034 and previous config saved to /var/cache/conftool/dbconfig/20220713-085244-ladsgroup.json	[production]
08:37	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31033 and previous config saved to /var/cache/conftool/dbconfig/20220713-083740-ladsgroup.json	[production]
08:22	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P31032 and previous config saved to /var/cache/conftool/dbconfig/20220713-082236-ladsgroup.json	[production]
08:05	<jayme>	'systemctl restart rsyslog' on kubernetes2007.codfw.wmnet,kubernetes2010.codfw.wmnet,kubernetes2014.codfw.wmnet,kubernetes2020.codfw.wmnet,kubernetes2009.codfw.wmnet	[production]
07:52	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply	[production]
07:52	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/services/mobileapps: apply	[production]
07:51	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mobileapps: apply	[production]
07:50	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/services/mobileapps: apply	[production]
07:02	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31031 and previous config saved to /var/cache/conftool/dbconfig/20220713-070229-root.json	[production]
06:47	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31030 and previous config saved to /var/cache/conftool/dbconfig/20220713-064725-root.json	[production]
06:45	<aqu>	analytics/refinery deploy aborted, no more space to deploy in /srv on an-launcher1002 eqiad	[production]
06:44	<aqu@deploy1002>	Finished deploy [analytics/refinery@bd39e67]: Regular analytics weekly train [analytics/refinery@bd39e67] (duration: 27m 02s)	[production]
06:32	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P31029 and previous config saved to /var/cache/conftool/dbconfig/20220713-063221-root.json	[production]