production SAL

1551-1600 of 10000 results (74ms)

2023-01-19 §
17:36	<Amir1>	bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.	[production]
17:17	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye	[production]
17:13	<zabe@deploy1002>	Finished scap: T233004 (duration: 18m 50s)	[production]
17:02	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage	[production]
16:58	<jiji@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage	[production]
16:54	<zabe@deploy1002>	Started scap: T233004	[production]
16:54	<zabe@deploy1002>	backport aborted: (duration: 15m 22s)	[production]
16:48	<godog>	roll-restart opensearch-dashboards in logstash collectors eqiad - T327161	[production]
16:44	<zabe@deploy1002>	Started scap: Backport for [[gerrit:881609\|Add ability to start from cuc_id to populateCucComment (T233004)]]	[production]
16:42	<jiji@cumin1001>	START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye	[production]
16:27	<moritzm>	installing cryptsetup updates for bullseye	[production]
16:18	<jmm@cumin2002>	END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector	[production]
16:13	<jclark@cumin1001>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']	[production]
16:11	<jclark@cumin1001>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']	[production]
16:09	<jclark@cumin1001>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
16:08	<jmm@cumin2002>	START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector	[production]
16:06	<jclark@cumin1001>	START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED	[production]
15:55	<sukhe>	update pybal to 1.15.10 on lvs4010: T321191	[production]
15:45	<effie>	enable puppet on C:memcached hosts	[production]
15:42	<godog>	bounce opensearch on logstash102[34] - T327161	[production]
15:30	<sukhe>	reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: T321191	[production]
15:19	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json	[production]
15:17	<effie>	disable puppet on all C:memcached servers to deploy 812173	[production]
15:04	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json	[production]
14:57	<jgiannelos@deploy1002>	helmfile [staging] DONE helmfile.d/services/mobileapps: apply	[production]
14:49	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json	[production]
14:47	<jgiannelos@deploy1002>	helmfile [staging] START helmfile.d/services/mobileapps: apply	[production]
14:40	<jgiannelos@deploy1002>	helmfile [staging] DONE helmfile.d/services/mobileapps: apply	[production]
14:34	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json	[production]
14:33	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance	[production]
14:33	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance	[production]
14:32	<zabe>	run populateCulComment on group2 wikis # T327290	[production]
14:30	<jgiannelos@deploy1002>	helmfile [staging] START helmfile.d/services/mobileapps: apply	[production]
14:09	<jgiannelos@deploy1002>	helmfile [staging] DONE helmfile.d/services/mobileapps: apply	[production]
13:58	<jgiannelos@deploy1002>	helmfile [staging] START helmfile.d/services/mobileapps: apply	[production]
12:27	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet	[production]
12:19	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet	[production]
12:06	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)	[production]
12:06	<moritzm>	stopping/masking slapd on ldap-corp1001/ldap-corp2001 T323820	[production]
11:36	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye	[production]
11:30	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reboot-cluster	[production]
11:29	<hnowlan>	rebooting maps-codfw for updates	[production]
11:29	<hnowlan@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet	[production]
11:24	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet	[production]
11:24	<filippo@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:24	<filippo@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"	[production]
11:22	<hnowlan@cumin1001>	START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet	[production]
11:20	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)	[production]
11:20	<jiji@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage	[production]
11:18	<filippo@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"	[production]