production SAL

1701-1750 of 10000 results (85ms)

2024-04-29 §
16:45	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host cp7007.mgmt.magru.wmnet with reboot policy FORCED	[production]
16:44	<robh@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with reboot policy FORCED	[production]
16:41	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P61436 and previous config saved to /var/cache/conftool/dbconfig/20240429-164158-marostegui.json	[production]
16:37	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host cp7005.mgmt.magru.wmnet with reboot policy FORCED	[production]
16:32	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with reboot policy FORCED	[production]
16:30	<robh@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:30	<robh@cumin2002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b3 cp hosts - robh@cumin2002"	[production]
16:29	<robh@cumin2002>	START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye	[production]
16:29	<robh@cumin2002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b3 cp hosts - robh@cumin2002"	[production]
16:27	<robh@cumin2002>	START - Cookbook sre.dns.netbox	[production]
16:26	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P61435 and previous config saved to /var/cache/conftool/dbconfig/20240429-162650-marostegui.json	[production]
16:26	<robh@cumin2002>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7002']	[production]
16:23	<jayme@cumin1002>	conftool action : set/pooled=yes; selector: name=kubestagemaster2003.codfw.wmnet	[production]
16:23	<jayme@cumin1002>	conftool action : set/weight=10; selector: name=kubestagemaster2003.codfw.wmnet	[production]
16:20	<robh@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']	[production]
16:20	<robh@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp7002']	[production]
16:19	<robh@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']	[production]
16:19	<robh@cumin2002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp7002']	[production]
16:19	<robh@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']	[production]
16:18	<robh@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye	[production]
16:11	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1200 (T361627)', diff saved to https://phabricator.wikimedia.org/P61434 and previous config saved to /var/cache/conftool/dbconfig/20240429-161143-marostegui.json	[production]
16:10	<jdrewniak@deploy1002>	Synchronized portals: Wikimedia Portals Update: [[gerrit:1025396\| Bumping portals to master (T128546)]] (duration: 14m 10s)	[production]
16:09	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db1200 (T361627)', diff saved to https://phabricator.wikimedia.org/P61433 and previous config saved to /var/cache/conftool/dbconfig/20240429-160859-marostegui.json	[production]
16:08	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance	[production]
16:08	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance	[production]
16:08	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61432 and previous config saved to /var/cache/conftool/dbconfig/20240429-160836-marostegui.json	[production]
16:06	<robh@cumin2002>	START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye	[production]
15:58	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Depool db2114 T363713', diff saved to https://phabricator.wikimedia.org/P61431 and previous config saved to /var/cache/conftool/dbconfig/20240429-155838-arnaudb.json	[production]
15:56	<jdrewniak@deploy1002>	Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:1025396\| Bumping portals to master (T128546)]] (duration: 14m 38s)	[production]
15:55	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Promote db2129 to s6 primary T363713', diff saved to https://phabricator.wikimedia.org/P61430 and previous config saved to /var/cache/conftool/dbconfig/20240429-155557-arnaudb.json	[production]
15:55	<arnaudb>	Starting s6 codfw failover from db2114 to db2129 - T363713	[production]
15:53	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P61429 and previous config saved to /var/cache/conftool/dbconfig/20240429-155328-marostegui.json	[production]
15:44	<swfrench@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
15:43	<swfrench@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
15:39	<swfrench@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
15:38	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P61428 and previous config saved to /var/cache/conftool/dbconfig/20240429-153821-marostegui.json	[production]
15:37	<ladsgroup@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance	[production]
15:37	<swfrench@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
15:37	<ladsgroup@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance	[production]
15:36	<swfrench@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
15:35	<swfrench@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
15:35	<root@cumin1002>	START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[29-42]*: Move Cassandra to PKI - root@cumin1002	[production]
15:34	<swfrench@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
15:32	<swfrench@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
15:28	<arnaudb@cumin1002>	dbctl commit (dc=all): 'depool db2151', diff saved to https://phabricator.wikimedia.org/P61427 and previous config saved to /var/cache/conftool/dbconfig/20240429-152809-arnaudb.json	[production]
15:25	<elukey@cumin1002>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Move to PKI TLS certs - elukey@cumin1002	[production]
15:23	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61426 and previous config saved to /var/cache/conftool/dbconfig/20240429-152314-marostegui.json	[production]
15:23	<robh@cumin1002>	END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp7002']	[production]
15:20	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61425 and previous config saved to /var/cache/conftool/dbconfig/20240429-152029-marostegui.json	[production]
15:20	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance	[production]