2024-06-06
ยง
|
08:57 |
<sfaci@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply |
[production] |
08:56 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet |
[production] |
08:56 |
<sfaci@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply |
[production] |
08:52 |
<filippo@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet |
[production] |
08:52 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64172 and previous config saved to /var/cache/conftool/dbconfig/20240606-085216-arnaudb.json |
[production] |
08:52 |
<filippo@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet |
[production] |
08:50 |
<dcaro@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1031.eqiad.wmnet |
[production] |
08:47 |
<mvernon@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1003.eqiad.wmnet |
[production] |
08:44 |
<dcaro@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet |
[production] |
08:44 |
<pfischer@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
08:43 |
<mvernon@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet |
[production] |
08:40 |
<mvernon@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet |
[production] |
08:39 |
<sfaci@deploy1002> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply |
[production] |
08:39 |
<sfaci@deploy1002> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply |
[production] |
08:38 |
<filippo@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet |
[production] |
08:37 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64171 and previous config saved to /var/cache/conftool/dbconfig/20240606-083710-arnaudb.json |
[production] |
08:36 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet |
[production] |
08:35 |
<pfischer@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
08:35 |
<pfischer@deploy1002> |
helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
08:19 |
<filippo@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet |
[production] |
08:17 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64167 and previous config saved to /var/cache/conftool/dbconfig/20240606-081753-marostegui.json |
[production] |
08:14 |
<stevemunene@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:14 |
<stevemunene@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:14 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64166 and previous config saved to /var/cache/conftool/dbconfig/20240606-081412-ladsgroup.json |
[production] |
08:02 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64165 and previous config saved to /var/cache/conftool/dbconfig/20240606-080245-marostegui.json |
[production] |
08:02 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet |
[production] |
08:01 |
<mvernon@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1001.eqiad.wmnet |
[production] |
08:00 |
<urbanecm@deploy1002> |
Started scap: Backport for [[gerrit:1039287|Add throttle exception for an upcoming workshop (T366748)]] |
[production] |
07:59 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64164 and previous config saved to /var/cache/conftool/dbconfig/20240606-075904-ladsgroup.json |
[production] |
07:50 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet |
[production] |
07:47 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64163 and previous config saved to /var/cache/conftool/dbconfig/20240606-074737-marostegui.json |
[production] |
07:43 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64162 and previous config saved to /var/cache/conftool/dbconfig/20240606-074356-ladsgroup.json |
[production] |
07:32 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64161 and previous config saved to /var/cache/conftool/dbconfig/20240606-073229-marostegui.json |
[production] |
07:30 |
<ryankemper@cumin2002> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555 |
[production] |
07:06 |
<hashar> |
Restarting Gerrit |
[production] |
07:05 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64160 and previous config saved to /var/cache/conftool/dbconfig/20240606-070558-ladsgroup.json |
[production] |
07:05 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance |
[production] |
07:05 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance |
[production] |
06:56 |
<dcaro@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1034.eqiad.wmnet |
[production] |
06:49 |
<dcaro@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host cloudcephosd1034.eqiad.wmnet |
[production] |
05:40 |
<ryankemper@cumin2002> |
END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) |
[production] |
05:21 |
<ryankemper@cumin2002> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555 |
[production] |
05:19 |
<ryankemper@cumin2002> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555 |
[production] |
05:04 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) |
[production] |
05:02 |
<ryankemper@cumin2002> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555 |
[production] |
04:17 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Depooling db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64159 and previous config saved to /var/cache/conftool/dbconfig/20240606-041714-marostegui.json |
[production] |
04:17 |
<marostegui@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance |
[production] |
04:16 |
<marostegui@cumin1002> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance |
[production] |
04:16 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64158 and previous config saved to /var/cache/conftool/dbconfig/20240606-041650-marostegui.json |
[production] |
04:01 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64157 and previous config saved to /var/cache/conftool/dbconfig/20240606-040142-marostegui.json |
[production] |