2024-06-05
ยง
|
17:04 |
<kamila@cumin1002> |
START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
17:02 |
<marostegui@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json |
[production] |
16:56 |
<kamila@cumin1002> |
END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001'] |
[production] |
16:56 |
<dzahn@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 |
[production] |
16:56 |
<dzahn@cumin1002> |
START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 |
[production] |
16:54 |
<mutante> |
downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785 |
[production] |
16:53 |
<dzahn@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 |
[production] |
16:53 |
<dzahn@cumin1002> |
START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 |
[production] |
16:52 |
<ladsgroup@deploy1002> |
Finished scap: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] (duration: 18m 23s) |
[production] |
16:48 |
<ryankemper@cumin2002> |
START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) |
[production] |
16:46 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json |
[production] |
16:46 |
<kamila@cumin1002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001'] |
[production] |
16:45 |
<kamila@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
16:43 |
<ladsgroup@deploy1002> |
ladsgroup and dr0ptp4kt: Continuing with sync |
[production] |
16:40 |
<jayme@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet |
[production] |
16:38 |
<kamila@cumin1002> |
START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
16:36 |
<ladsgroup@deploy1002> |
ladsgroup and dr0ptp4kt: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
16:34 |
<kamila@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
16:34 |
<ladsgroup@deploy1002> |
Started scap: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] |
[production] |
16:32 |
<jayme@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet |
[production] |
16:31 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json |
[production] |
16:20 |
<jforrester@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:18 |
<jforrester@deploy1002> |
helmfile [eqiad] START helmfile.d/services/wikifunctions: apply |
[production] |
16:18 |
<dcaro@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet |
[production] |
16:18 |
<jforrester@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:16 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json |
[production] |
16:16 |
<jforrester@deploy1002> |
helmfile [codfw] START helmfile.d/services/wikifunctions: apply |
[production] |
16:15 |
<jforrester@deploy1002> |
helmfile [staging] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:14 |
<jforrester@deploy1002> |
helmfile [staging] START helmfile.d/services/wikifunctions: apply |
[production] |
16:12 |
<dcaro@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet |
[production] |
16:11 |
<jforrester@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:10 |
<kamila@cumin1002> |
START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
16:10 |
<jforrester@deploy1002> |
helmfile [eqiad] START helmfile.d/services/wikifunctions: apply |
[production] |
16:10 |
<jforrester@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:08 |
<jforrester@deploy1002> |
helmfile [codfw] START helmfile.d/services/wikifunctions: apply |
[production] |
16:05 |
<jayme@deploy1002> |
helmfile [staging] DONE helmfile.d/services/wikifunctions: apply |
[production] |
16:05 |
<jayme@deploy1002> |
helmfile [staging] START helmfile.d/services/wikifunctions: apply |
[production] |
16:01 |
<aokoth@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/miscweb: apply |
[production] |
16:01 |
<aokoth@deploy1002> |
helmfile [eqiad] START helmfile.d/services/miscweb: apply |
[production] |
16:01 |
<kamila@cumin1002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye |
[production] |
16:01 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64117 and previous config saved to /var/cache/conftool/dbconfig/20240605-160116-ladsgroup.json |
[production] |
15:59 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64116 and previous config saved to /var/cache/conftool/dbconfig/20240605-155955-ladsgroup.json |
[production] |
15:59 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance |
[production] |
15:59 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance |
[production] |
15:59 |
<mvernon@cumin1002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet |
[production] |
15:58 |
<aokoth@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/miscweb: apply |
[production] |
15:58 |
<aokoth@deploy1002> |
helmfile [codfw] START helmfile.d/services/miscweb: apply |
[production] |
15:57 |
<aokoth@deploy1002> |
helmfile [staging] DONE helmfile.d/services/miscweb: apply |
[production] |
15:56 |
<aokoth@deploy1002> |
helmfile [staging] START helmfile.d/services/miscweb: apply |
[production] |
15:51 |
<mvernon@cumin1002> |
START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet |
[production] |