production SAL

1101-1150 of 10000 results (66ms)

2024-02-26 §
22:09	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P57961 and previous config saved to /var/cache/conftool/dbconfig/20240226-220928-arnaudb.json	[production]
22:06	<ryankemper@cumin2002>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - ryankemper@cumin2002 - T356651	[production]
22:02	<jdrewniak@deploy2002>	Synchronized portals: Wikimedia Portals Update: [[gerrit:1006579\| Bumping portals to master (T128546)]] (duration: 08m 37s)	[production]
21:56	<jclark@cumin1002>	START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm	[production]
21:54	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P57960 and previous config saved to /var/cache/conftool/dbconfig/20240226-215422-arnaudb.json	[production]
21:54	<jdrewniak@deploy2002>	Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:1006579\| Bumping portals to master (T128546)]] (duration: 08m 26s)	[production]
21:39	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2109 (T357189)', diff saved to https://phabricator.wikimedia.org/P57959 and previous config saved to /var/cache/conftool/dbconfig/20240226-213916-arnaudb.json	[production]
21:38	<cjming@deploy2002>	Finished scap: Backport for [[gerrit:1006312\|Fix regression in WebM transcodes breaking audio (T358342)]] (duration: 11m 14s)	[production]
21:30	<cjming@deploy2002>	cjming and bvibber: Continuing with sync	[production]
21:29	<cjming@deploy2002>	cjming and bvibber: Backport for [[gerrit:1006312\|Fix regression in WebM transcodes breaking audio (T358342)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
21:27	<cjming@deploy2002>	Started scap: Backport for [[gerrit:1006312\|Fix regression in WebM transcodes breaking audio (T358342)]]	[production]
21:22	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1004.eqiad.wmnet with OS bullseye	[production]
21:16	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Depooling db2109 (T357189)', diff saved to https://phabricator.wikimedia.org/P57958 and previous config saved to /var/cache/conftool/dbconfig/20240226-211619-arnaudb.json	[production]
21:16	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance	[production]
21:16	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance	[production]
21:15	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57957 and previous config saved to /var/cache/conftool/dbconfig/20240226-211557-arnaudb.json	[production]
21:10	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1004.eqiad.wmnet with reason: host reimage	[production]
21:07	<dzahn@cumin1002>	START - Cookbook sre.hosts.downtime for 2:00:00 on contint1004.eqiad.wmnet with reason: host reimage	[production]
21:02	<ebernhardson@deploy2002>	helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:02	<ebernhardson@deploy2002>	helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:00	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P57956 and previous config saved to /var/cache/conftool/dbconfig/20240226-210050-arnaudb.json	[production]
20:58	<dzahn@cumin1002>	START - Cookbook sre.hosts.reimage for host contint1004.eqiad.wmnet with OS bullseye	[production]
20:58	<dzahn@cumin1002>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host contint1004.eqiad.wmnet	[production]
20:57	<dzahn@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host contint1004.eqiad.wmnet with OS bullseye	[production]
20:52	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1040.eqiad.wmnet with OS bookworm	[production]
20:52	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1039.eqiad.wmnet with OS bookworm	[production]
20:52	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1038.eqiad.wmnet with OS bookworm	[production]
20:51	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1036.eqiad.wmnet with OS bookworm	[production]
20:46	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1037.eqiad.wmnet with OS bookworm	[production]
20:45	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P57955 and previous config saved to /var/cache/conftool/dbconfig/20240226-204544-arnaudb.json	[production]
20:44	<mutante>	T358237 used the next hostname number,1004, to avoid the duplicate IP issue. makevm cookbook is at attempt 103/240 to detect a reboot of the VM and uptime just keeps going up. used the "gnt-instance console --show-cmd " trick to get a console despite https://phabricator.wikimedia.org/T309724 - was missing partman config	[production]
20:41	<jclark@cumin1002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1035.eqiad.wmnet with OS bookworm	[production]
20:30	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57954 and previous config saved to /var/cache/conftool/dbconfig/20240226-203038-arnaudb.json	[production]
20:19	<dzahn@cumin1002>	START - Cookbook sre.hosts.reimage for host contint1004.eqiad.wmnet with OS bullseye	[production]
20:18	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm	[production]
20:18	<dzahn@cumin1002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1004.eqiad.wmnet - dzahn@cumin1002"	[production]
20:17	<dzahn@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1004.eqiad.wmnet - dzahn@cumin1002"	[production]
20:17	<dzahn@cumin1002>	END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1004.eqiad.wmnet on all recursors	[production]
20:17	<dzahn@cumin1002>	START - Cookbook sre.dns.wipe-cache contint1004.eqiad.wmnet on all recursors	[production]
20:17	<dzahn@cumin1002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
20:17	<dzahn@cumin1002>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1004.eqiad.wmnet - dzahn@cumin1002"	[production]
20:16	<dzahn@cumin1002>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1004.eqiad.wmnet - dzahn@cumin1002"	[production]
20:14	<sukhe>	running dummy authdns-update	[production]
20:12	<dzahn@cumin1002>	START - Cookbook sre.dns.netbox	[production]
20:12	<dzahn@cumin1002>	START - Cookbook sre.ganeti.makevm for new host contint1004.eqiad.wmnet	[production]
20:07	<arnaudb@cumin1002>	dbctl commit (dc=all): 'Depooling db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57953 and previous config saved to /var/cache/conftool/dbconfig/20240226-200734-arnaudb.json	[production]
20:07	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance	[production]
20:07	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance	[production]
20:07	<bblack@cumin1002>	conftool action : set/pooled=no; selector: cluster=dnsbox,service=authdns-update,name=dns3001.wikimedia.org	[production]
20:03	<cmooney@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage	[production]