production SAL

3951-4000 of 10000 results (89ms)

2023-06-27 §
15:24	<root@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet	[production]
15:24	<root@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
15:24	<jbond>	puppet-merge temprrarily broken	[production]
15:23	<jbond>	hi all fyi i have temporarily broken puppet-merge, fix is being done	[production]
15:23	<root@cumin2002>	END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
15:23	<root@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
15:21	<root@cumin2002>	END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
15:20	<root@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
15:01	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
14:53	<mforns@deploy1002>	Finished deploy [airflow-dags/analytics@5e77b01]: (no justification provided) (duration: 00m 10s)	[production]
14:52	<mforns@deploy1002>	Started deploy [airflow-dags/analytics@5e77b01]: (no justification provided)	[production]
14:47	<root@cumin2002>	END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
14:46	<root@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet	[production]
14:41	<elukey@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001	[production]
14:27	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
14:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet	[production]
14:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet	[production]
14:23	<elukey@cumin1001>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001	[production]
14:21	<elukey@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001	[production]
14:17	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet	[production]
14:16	<jmm@cumin2002>	START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet	[production]
14:04	<elukey@cumin1001>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001	[production]
13:32	<elukey>	expand ml-staging200[12] kubelet partitions - T339231	[production]
13:27	<stevemunene@cumin1001>	START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye	[production]
13:26	<joal@deploy1002>	Finished deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7] (duration: 00m 09s)	[production]
13:26	<stevemunene@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye	[production]
13:26	<joal@deploy1002>	Started deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7]	[production]
13:18	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
13:06	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
12:58	<elukey@deploy1002>	helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.	[production]
12:57	<elukey@deploy1002>	helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.	[production]
12:57	<marostegui>	Failover m3-master to dbproxy1026 T337812	[production]
11:55	<daniel@deploy1002>	Finished scap: Backport for [[gerrit:933437\|Parsoid: Disable PC writes on enwiki (T339867)]] (duration: 12m 06s)	[production]
11:51	<jgiannelos@deploy1002>	helmfile [staging] DONE helmfile.d/services/wikifeeds: apply	[production]
11:50	<jgiannelos@deploy1002>	helmfile [staging] START helmfile.d/services/wikifeeds: apply	[production]
11:44	<daniel@deploy1002>	daniel: Backport for [[gerrit:933437\|Parsoid: Disable PC writes on enwiki (T339867)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
11:43	<daniel@deploy1002>	Started scap: Backport for [[gerrit:933437\|Parsoid: Disable PC writes on enwiki (T339867)]]	[production]
11:21	<daniel@deploy1002>	Finished scap: Backport for [[gerrit:933184\|Parsoid: Disable PC writes on dewiki (T339867)]] (duration: 08m 34s)	[production]
11:20	<hnowlan@puppetmaster1001>	conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet	[production]
11:14	<daniel@deploy1002>	daniel: Backport for [[gerrit:933184\|Parsoid: Disable PC writes on dewiki (T339867)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet	[production]
11:12	<daniel@deploy1002>	Started scap: Backport for [[gerrit:933184\|Parsoid: Disable PC writes on dewiki (T339867)]]	[production]
11:08	<joal@deploy1002>	Finished deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2] (duration: 01m 43s)	[production]
11:06	<joal@deploy1002>	Started deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2]	[production]
11:06	<joal@deploy1002>	Finished deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2] (duration: 00m 04s)	[production]
11:06	<joal@deploy1002>	Started deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2]	[production]
11:04	<joal@deploy1002>	Finished deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2] (duration: 08m 23s)	[production]
11:02	<stevemunene@cumin1001>	START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye	[production]
10:55	<joal@deploy1002>	Started deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2]	[production]
10:48	<elukey@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001	[production]
10:43	<hnowlan>	disabling puppet on A:cp-text to test rollout of r/929674	[production]