production SAL

1551-1600 of 10000 results (30ms)

2021-01-14 §
21:50	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE	[production]
21:49	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE	[production]
21:48	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE	[production]
21:47	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE	[production]
21:23	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet	[production]
21:23	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet	[production]
21:19	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE	[production]
21:18	<razzi@cumin1001>	END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001	[production]
21:18	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE	[production]
21:16	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw2242.codfw.wmnet	[production]
21:16	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=mw2241.codfw.wmnet	[production]
21:16	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE	[production]
21:15	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE	[production]
21:15	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE	[production]
21:14	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE	[production]
21:12	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw2258.codfw.wmnet	[production]
21:12	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet	[production]
21:10	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet	[production]
21:10	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet	[production]
20:17	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE	[production]
20:17	<mutante>	ACKing all unhandled crit alerts about systemd on clouddb hosts - notifications are disabled but this cleans up Icinga web UI noise - T267090	[production]
20:15	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE	[production]
20:05	<razzi@cumin1001>	START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001	[production]
19:31	<urbanecm@deploy1001>	Synchronized dblists/closed.dblist: d3e274e9b953f5edda07fa3a016b7291a451ceb2: Close lrcwiki (T272041) (duration: 00m 58s)	[production]
19:03	<mutante>	mc1024 - attempting to power on via mgmt, went down and power down	[production]
18:45	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE	[production]
18:43	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE	[production]
18:41	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE	[production]
18:41	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE	[production]
18:40	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE	[production]
18:39	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE	[production]
18:38	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE	[production]
18:38	<Amir1>	started mass deletion of lrcwiki (T272041) - https://w.wiki/uPV	[production]
18:37	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE	[production]
18:36	<jynus>	restarting backup1002, backup2002 T271913	[production]
18:05	<jynus>	restarting backup1001, backup2001 T271913	[production]
16:47	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack	[production]
16:47	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack	[production]
16:47	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack	[production]
16:46	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack	[production]
16:32	<moritzm>	installing php-pear updates on stretch	[production]
16:03	<moritzm>	installing tomcat8 security updates	[production]
15:40	<moritzm>	installing sqlite3 security updates on Stretch	[production]
15:30	<papaul>	power down ms-be2022 for maintenance	[production]
15:19	<otto@deploy1001>	Finished deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - T264358 (duration: 02m 16s)	[production]
15:16	<otto@deploy1001>	Started deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - T264358	[production]
15:11	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)	[production]
15:00	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack	[production]
14:59	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack	[production]
14:59	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack	[production]