production SAL

2301-2350 of 10000 results (116ms)

2024-08-14 §
12:52	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67293 and previous config saved to /var/cache/conftool/dbconfig/20240814-125245-arnaudb.json	[production]
12:49	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: replication table exclusion deployment	[production]
12:49	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: replication table exclusion deployment	[production]
12:37	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67292 and previous config saved to /var/cache/conftool/dbconfig/20240814-123739-arnaudb.json	[production]
12:22	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 16%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67291 and previous config saved to /var/cache/conftool/dbconfig/20240814-122234-arnaudb.json	[production]
12:07	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 8%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67290 and previous config saved to /var/cache/conftool/dbconfig/20240814-120729-arnaudb.json	[production]
11:52	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 4%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67289 and previous config saved to /var/cache/conftool/dbconfig/20240814-115223-arnaudb.json	[production]
11:37	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 2%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67288 and previous config saved to /var/cache/conftool/dbconfig/20240814-113718-arnaudb.json	[production]
11:23	<mvolz@deploy1003>	helmfile [eqiad] DONE helmfile.d/services/citoid: apply	[production]
11:23	<mvolz@deploy1003>	helmfile [eqiad] START helmfile.d/services/citoid: apply	[production]
11:22	<arnaudb@cumin1002>	dbctl commit (dc=all): 'db2189 (re)pooling @ 1%: corrupted index fixed', diff saved to https://phabricator.wikimedia.org/P67287 and previous config saved to /var/cache/conftool/dbconfig/20240814-112212-arnaudb.json	[production]
11:20	<mvolz@deploy1003>	helmfile [codfw] DONE helmfile.d/services/citoid: apply	[production]
11:19	<mvolz@deploy1003>	helmfile [codfw] START helmfile.d/services/citoid: apply	[production]
11:19	<mvolz@deploy1003>	helmfile [staging] DONE helmfile.d/services/citoid: apply	[production]
11:18	<mvolz@deploy1003>	helmfile [staging] START helmfile.d/services/citoid: apply	[production]
09:56	<fnegri@cumin1002>	conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1	[production]
09:26	<klausman@deploy1003>	helmfile [codfw] DONE helmfile.d/services/api-gateway: apply	[production]
09:26	<klausman@deploy1003>	helmfile [codfw] START helmfile.d/services/api-gateway: apply	[production]
09:23	<klausman@deploy1003>	helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply	[production]
09:23	<klausman@deploy1003>	helmfile [eqiad] START helmfile.d/services/api-gateway: apply	[production]
09:17	<klausman@deploy1003>	helmfile [staging] DONE helmfile.d/services/api-gateway: apply	[production]
09:16	<klausman@deploy1003>	helmfile [staging] START helmfile.d/services/api-gateway: apply	[production]
09:11	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up	[production]
09:11	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: replication still catching up	[production]
08:53	<jayme@cumin1002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-main2010.codfw.wmnet with OS bullseye	[production]
08:46	<jayme@cumin1002>	START - Cookbook sre.hosts.reimage for host kafka-main2010.codfw.wmnet with OS bullseye	[production]
07:45	<arnaudb@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: index corruption	[production]
07:45	<arnaudb@cumin1002>	START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: index corruption	[production]
00:54	<eileen>	config revision changed from d6f17100 to f569b590	[production]
00:41	<eileen>	civicrm upgraded from dd54b9ae to eecbba5d	[production]
00:11	<eileen>	civicrm upgraded from 686c7c5f to dd54b9ae	[production]
00:04	<eileen>	config revision changed from e8cc0ed6 to d6f17100	[production]
2024-08-13 §
23:08	<ejegg>	payments-wiki upgraded from 2d48f432 to 3eb3be67	[production]
21:56	<inflatador>	bking@cumin2002 reboot wdqs101[3-5],1018,1020 from DRAC due to unresponsiveness T372442	[production]
21:16	<ebernhardson@deploy1003>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:16	<ebernhardson@deploy1003>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:15	<ebernhardson@deploy1003>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:15	<ebernhardson@deploy1003>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:09	<ebernhardson@deploy1003>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:09	<ebernhardson@deploy1003>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:07	<ebernhardson@deploy1003>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
21:07	<ebernhardson@deploy1003>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
20:51	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards	[production]
20:22	<brett>	Update ncmonitor to 1.2.0 via apt1002	[production]
19:57	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards	[production]
19:44	<ebernhardson@deploy1003>	helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply	[production]
19:43	<ebernhardson@deploy1003>	helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply	[production]
19:32	<bking@cumin2002>	END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (2 nodes at a time) for ElasticSearch cluster search_eqiad: security update - bking@cumin2002 - T371874	[production]
19:29	<ryankemper@cumin2002>	END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards	[production]
19:27	<ryankemper@cumin2002>	START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards	[production]