production SAL

5351-5400 of 10000 results (88ms)

2023-04-24 §
10:01	<moritzm>	installing git security updates	[production]
09:55	<slyngs>	Update LDAP schema wmf-user: T148048	[production]
09:55	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Enabling replication T335266	[production]
09:55	<marostegui>	Enable replication eqiad -> codfw on s7 dbmaint eqiad T335266	[production]
09:54	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Enabling replication T335266	[production]
09:25	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1110.eqiad.wmnet	[production]
09:21	<moritzm>	upgrade php-excimer on mw canaries to 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) T332964	[production]
08:45	<moritzm>	uploaded php-excimer 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) to component/php74 for buster-wikimedia T332964	[production]
08:44	<marostegui>	Enable replication eqiad -> codfw on s8 dbmaint eqiad T335266	[production]
08:44	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Enabling replication T335266	[production]
08:43	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266	[production]
08:33	<marostegui>	Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266	[production]
08:32	<cgoubert@deploy2002>	Finished scap: testing T329857 (duration: 14m 29s)	[production]
08:32	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266	[production]
08:32	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266	[production]
08:29	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266	[production]
08:28	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266	[production]
08:28	<marostegui>	Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266	[production]
08:27	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266	[production]
08:26	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266	[production]
08:26	<marostegui>	Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266	[production]
08:25	<btullis@cumin1001>	START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet	[production]
08:21	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware	[production]
08:21	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware	[production]
08:20	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266	[production]
08:20	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266	[production]
08:20	<marostegui>	Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266	[production]
08:18	<cgoubert@deploy2002>	Started scap: testing T329857	[production]
08:17	<marostegui>	Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266	[production]
08:14	<claime>	Deploying 909302 on deploy2002 for T329857	[production]
08:10	<claime>	Disabling puppet on deploy2002 - T329857	[production]
08:09	<claime>	Deploying 909302 on deploy1002 for T329857	[production]
08:08	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266	[production]
08:08	<marostegui>	Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266	[production]
08:08	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266	[production]
08:07	<marostegui>	Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266	[production]
08:06	<marostegui>	Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266	[production]
08:05	<marostegui>	Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266	[production]
07:53	<mvernon@cumin2002>	END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw	[production]
07:51	<mvernon@cumin2002>	START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw	[production]
07:45	<jelto@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye	[production]
07:44	<mvernon@cumin2002>	END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw	[production]
07:42	<mvernon@cumin2002>	START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw	[production]
07:39	<dcausse>	restarting blazegraph on wdqs1005 (stuck for 3+days)	[production]
07:38	<mvernon@cumin2002>	END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw	[production]
07:36	<mvernon@cumin2002>	START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw	[production]
07:24	<jelto@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage	[production]
07:21	<jelto@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage	[production]
07:06	<jelto@cumin2002>	START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye	[production]
2023-04-22 §
05:41	<joe>	<thumbor/codfw>$ helmfile --state-values-set roll_restart=1 -e codfw sync	[production]