production SAL

1251-1300 of 10000 results (72ms)

2023-01-09 §
10:54	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet	[production]
10:54	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet	[production]
10:51	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet	[production]
10:49	<jayme@cumin1001>	START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet	[production]
10:49	<jayme@cumin1001>	START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet	[production]
10:48	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet	[production]
10:46	<jiji@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad	[production]
10:46	<effie>	switching maps to eqiad	[production]
10:45	<moritzm>	installing avahi security updates	[production]
10:44	<jayme@cumin1001>	START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet	[production]
10:41	<jayme@cumin1001>	conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw	[production]
09:35	<dcausse>	restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)	[production]
09:11	<mvernon@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet	[production]
09:04	<mvernon@cumin2002>	START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet	[production]
08:58	<moritzm>	installing glibc security updates	[production]
08:56	<XioNoX>	depool ulsfo for network maintenance - T316532	[production]
08:26	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700	[production]
08:26	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 327700	[production]
08:25	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237	[production]
08:24	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 48237	[production]
08:23	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035	[production]
08:21	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org	[production]
08:21	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 32035	[production]
08:12	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors	[production]
08:12	<slyngshede@cumin1001>	START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors	[production]
08:12	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
08:12	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"	[production]
08:08	<slyngshede@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"	[production]
08:06	<slyngshede@cumin1001>	START - Cookbook sre.dns.netbox	[production]
08:06	<slyngshede@cumin1001>	START - Cookbook sre.ganeti.makevm for new host idm-test1001.wikimedia.org	[production]
2023-01-06 §
18:57	<mutante>	systemctl start docker-gc on all gitlab-runners via cumin T310593	[production]
18:56	<mutante>	gitlab-runner1002 - systemctl start docker-gc; run puppet on all gitlab-runners T310593	[production]
18:49	<dzahn@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: debugging	[production]
18:49	<dzahn@cumin2002>	START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: debugging	[production]
18:36	<sukhe>	pool cp5032 [bullseye upgrade completed]: T325797	[production]
18:34	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=ats-be	[production]
18:34	<sukhe@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=cdn	[production]
18:20	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425	[production]
18:20	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425	[production]
18:13	<Krinkle>	krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431	[production]
17:58	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye	[production]
17:29	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage	[production]
17:26	<sukhe@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage	[production]
16:53	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye	[production]
16:53	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye	[production]
16:26	<sukhe@cumin2002>	START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye	[production]
16:18	<sukhe@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye	[production]
16:05	<cgoubert@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error	[production]
16:05	<cgoubert@cumin1001>	START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error	[production]
15:54	<cgoubert@cumin1001>	conftool action : set/pooled=inactive; selector: name=mw1486.eqiad.wmnet	[production]