__all__ SAL

4151-4200 of 10000 results (30ms)

2023-07-07 §
11:02	<aborrero@cumin1001>	START - Cookbook sre.dns.netbox	[production]
10:28	<taavi>	backfilling {project}.wmcloud.org and other currently-named DNS zones to projects that don't have them	[admin]
10:13	<moritzm>	rebooting puppetdb1003	[production]
10:09	<moritzm>	rebooting puppetserver1001	[production]
10:07	<wm-bot>	<sebastian-berlin-wmse> Deploy code with reverted M2C changes (a7fb483) in order to debug errors on tools.isa. Started from scratch using python3.11 and kubernetes, and a copy of the database on tools.isa.	[tools.isa-dev]
10:06	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet	[production]
10:05	<moritzm>	rebooting puppetserver2001	[production]
10:05	<jiji@deploy1002>	helmfile [staging] DONE helmfile.d/services/ipoid: apply	[production]
10:03	<jiji@deploy1002>	helmfile [staging] START helmfile.d/services/ipoid: apply	[production]
09:59	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet	[production]
09:56	<btullis>	`sudo systemctl start hadoop-hdfs-namenode.service ` on an-master1001	[analytics]
09:55	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet	[production]
09:55	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet	[production]
09:52	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet	[production]
09:52	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet	[production]
09:46	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet	[production]
09:46	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet	[production]
09:45	<stevemunene@cumin1001>	END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.	[production]
09:39	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet	[production]
09:37	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet	[production]
09:35	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet	[production]
09:34	<jmm@cumin2002>	END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org	[production]
09:33	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet	[production]
09:29	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet	[production]
09:29	<stevemunene@cumin1001>	START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.	[production]
09:28	<stevemunene>	running sre.hadoop.roll-restart-masters restart the maters to completely remove any reference of analytics[1058-1069] T317861	[analytics]
09:26	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet	[production]
09:24	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet	[production]
09:24	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org	[production]
09:20	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet	[production]
09:19	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet	[production]
09:19	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet	[production]
09:18	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet	[production]
09:17	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet	[production]
09:15	<stevemunene>	run puppet on hadoop masters to pick up changes from recently decommissioned hosts	[analytics]
09:13	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet	[production]
09:12	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet	[production]
08:53	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
08:50	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
08:48	<moritzm>	installing bookworm kernel updates	[production]
08:47	<jmm@cumin2002>	END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui2002.codfw.wmnet	[production]
08:47	<jmm@cumin2002>	START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui2002.codfw.wmnet	[production]
08:46	<jmm@cumin2002>	END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui1002.eqiad.wmnet	[production]
08:46	<jmm@cumin2002>	START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui1002.eqiad.wmnet	[production]
08:12	<elukey>	wipe kafka-test cluster (data + zookeper config) to start clean after the issue happened yesterday	[analytics]
08:05	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster	[production]
08:05	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster	[production]
01:55	<bking@cumin1001>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)	[production]
00:28	<bking@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
2023-07-06 §
23:14	<mutante>	mx1001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them	[production]