production SAL

551-600 of 10000 results (64ms)

2022-08-03 §
17:56	<bking@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet	[production]
17:56	<bking@cumin1001>	START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet	[production]
17:55	<ottomata>	increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - T314426	[production]
17:55	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet	[production]
17:55	<mvernon@cumin1001>	START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet	[production]
17:50	<rzl@cumin1001>	conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet	[production]
17:38	<rzl@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet	[production]
17:38	<rzl@cumin1001>	START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet	[production]
17:23	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet	[production]
17:14	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts	[production]
17:14	<mvernon@cumin1001>	START - Cookbook sre.hosts.remove-downtime for 6 hosts	[production]
17:08	<ryankemper>	T310145 `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance	[production]
17:06	<jayme@cumin1001>	conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet\|kubernetes2009.codfw.wmnet\|kubernetes2010.codfw.wmnet)	[production]
17:00	<btullis@cumin1001>	END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.	[production]
16:48	<Emperor>	shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work T310145	[production]
16:47	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work	[production]
16:47	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping	[production]
16:47	<mvernon@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work	[production]
16:47	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping	[production]
16:46	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet	[production]
16:46	<mvernon@cumin1001>	START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet	[production]
16:40	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet	[production]
16:40	<jayme@cumin1001>	START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet	[production]
16:39	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts	[production]
16:39	<jayme@cumin1001>	START - Cookbook sre.hosts.remove-downtime for 10 hosts	[production]
16:38	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet	[production]
16:38	<jelto@cumin1001>	START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet	[production]
16:37	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap	[production]
16:37	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap	[production]
16:35	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap	[production]
16:35	<jayme@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap	[production]
16:32	<jelto>	power off mc2025-2026	[production]
16:31	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet	[production]
16:30	<jayme@cumin1001>	START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet	[production]
16:28	<btullis@cumin1001>	START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.	[production]
16:28	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet	[production]
16:27	<jayme@cumin1001>	START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet	[production]
16:11	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts	[production]
16:11	<jelto@cumin1001>	START - Cookbook sre.hosts.remove-downtime for 12 hosts	[production]
16:08	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts	[production]
16:08	<jayme@cumin1001>	START - Cookbook sre.hosts.remove-downtime for 15 hosts	[production]
16:08	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet	[production]
16:08	<mvernon@cumin1001>	START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet	[production]
15:59	<Emperor>	shutdown ms-be20[33,47],thanos-be2002 prior to PDU work T310070	[production]
15:58	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work	[production]
15:58	<mvernon@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work	[production]
15:52	<jelto>	pooling mw2259-2270 again	[production]
15:45	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json	[production]
15:38	<vgutierrez>	clearing ats-be cache on cp6008 - T309651	[production]
15:38	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]