production SAL

8501-8550 of 10000 results (78ms)

2024-06-05 §
19:27	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]
19:09	<swfrench@deploy1002>	helmfile [codfw] DONE helmfile.d/services/data-gateway: apply	[production]
18:58	<swfrench@deploy1002>	helmfile [codfw] START helmfile.d/services/data-gateway: apply	[production]
18:53	<dduvall@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402	[production]
18:53	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]
18:42	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json	[production]
18:27	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json	[production]
18:13	<swfrench@deploy1002>	helmfile [staging] DONE helmfile.d/services/data-gateway: apply	[production]
18:12	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json	[production]
18:12	<swfrench@deploy1002>	helmfile [staging] START helmfile.d/services/data-gateway: apply	[production]
18:11	<aokoth@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet	[production]
18:07	<aokoth@cumin1002>	START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet	[production]
18:06	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]
17:57	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json	[production]
17:55	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json	[production]
17:50	<kamila@cumin1002>	START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet	[production]
17:47	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance	[production]
17:47	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance	[production]
17:47	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json	[production]
17:42	<ladsgroup@deploy1002>	Finished scap: Backport for [[gerrit:1039256\|Stop writing to pagelinks old columns in enwiki (T352010)]] (duration: 12m 19s)	[production]
17:39	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json	[production]
17:33	<ladsgroup@deploy1002>	ladsgroup: Continuing with sync	[production]
17:32	<ladsgroup@deploy1002>	ladsgroup: Backport for [[gerrit:1039256\|Stop writing to pagelinks old columns in enwiki (T352010)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)	[production]
17:32	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json	[production]
17:31	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]
17:29	<ladsgroup@deploy1002>	Started scap: Backport for [[gerrit:1039256\|Stop writing to pagelinks old columns in enwiki (T352010)]]	[production]
17:27	<kamila@cumin1002>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']	[production]
17:24	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]
17:24	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json	[production]
17:17	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json	[production]
17:13	<kamila@cumin1002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']	[production]
17:12	<kamila@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye	[production]
17:10	<jhathaway>	phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox	[production]
17:10	<dcaro@cumin1002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet	[production]
17:09	<ladsgroup@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json	[production]
17:06	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785	[production]
17:06	<dcaro@cumin1002>	START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet	[production]
17:06	<dzahn@cumin1002>	START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785	[production]
17:05	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785	[production]
17:05	<dzahn@cumin1002>	START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785	[production]
17:04	<kamila@cumin1002>	START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye	[production]
17:02	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json	[production]
16:56	<kamila@cumin1002>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']	[production]
16:56	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785	[production]
16:56	<dzahn@cumin1002>	START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785	[production]
16:54	<mutante>	downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785	[production]
16:53	<dzahn@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785	[production]
16:53	<dzahn@cumin1002>	START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785	[production]
16:52	<ladsgroup@deploy1002>	Finished scap: Backport for [[gerrit:1038392\|Bump XML dump schema to version 0.11 (T365155)]] (duration: 18m 23s)	[production]
16:48	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)	[production]