production SAL

2601-2650 of 10000 results (79ms)

2023-01-11 §
17:55	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement	[production]
17:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json	[production]
17:50	<hnowlan@deploy1002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
17:50	<hnowlan@deploy1002>	helmfile [codfw] DONE helmfile.d/services/thumbor: apply	[production]
17:43	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json	[production]
17:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json	[production]
17:40	<hnowlan@deploy1002>	helmfile [codfw] START helmfile.d/services/thumbor: apply	[production]
17:39	<hnowlan@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/thumbor: apply	[production]
17:29	<hnowlan@deploy1002>	helmfile [eqiad] START helmfile.d/services/thumbor: apply	[production]
17:28	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json	[production]
17:28	<jayme@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
17:28	<jayme@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
17:25	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json	[production]
17:21	<jayme@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
17:21	<jayme@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
17:21	<jayme@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
17:20	<jayme@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
17:18	<jayme@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
17:18	<jayme@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
17:13	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json	[production]
17:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json	[production]
17:11	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
17:10	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
17:10	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance	[production]
17:10	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance	[production]
17:10	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json	[production]
17:10	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance	[production]
17:09	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance	[production]
17:04	<marostegui>	dbmaint deploy schema change with replication on s7 eqiad T321391	[production]
17:03	<jayme@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
17:03	<jayme@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
16:38	<marostegui>	dbmaint deploy schema change with replication on s5 eqiad T321391	[production]
16:31	<marostegui>	dbmaint deploy schema change with replication on s4 eqiad T321391	[production]
16:25	<marostegui>	dbmaint deploy schema change with replication on s8 eqiad T321391	[production]
16:22	<marostegui>	dbmaint deploy schema change with replication on s6 eqiad T321391	[production]
16:06	<volans@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:06	<volans@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"	[production]
16:05	<volans@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"	[production]
16:03	<volans@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:01	<jiji@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye	[production]
16:00	<pt1979@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:58	<pt1979@cumin2002>	START - Cookbook sre.dns.netbox	[production]
15:53	<zabe@deploy1002>	Finished scap: T233004 (duration: 07m 54s)	[production]
15:45	<zabe@deploy1002>	Started scap: T233004	[production]
15:38	<zabe@deploy1002>	backport aborted: (duration: 04m 25s)	[production]
15:38	<zabe@deploy1002>	sync-world aborted: Backport for [[gerrit:878870\|Start reading from cul_actor everywhere (T233004)]] (duration: 04m 00s)	[production]
15:36	<zabe@deploy1002>	zabe and zabe: Backport for [[gerrit:878870\|Start reading from cul_actor everywhere (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet	[production]
15:34	<zabe@deploy1002>	Started scap: Backport for [[gerrit:878870\|Start reading from cul_actor everywhere (T233004)]]	[production]
15:31	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)	[production]
15:21	<marostegui>	Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) T326669	[production]