2022-06-15
§
|
07:50 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance |
[production] |
07:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P29755 and previous config saved to /var/cache/conftool/dbconfig/20220615-074736-root.json |
[production] |
07:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1148 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29754 and previous config saved to /var/cache/conftool/dbconfig/20220615-073538-root.json |
[production] |
07:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P29753 and previous config saved to /var/cache/conftool/dbconfig/20220615-073232-root.json |
[production] |
07:24 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
07:23 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
07:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T310011)', diff saved to https://phabricator.wikimedia.org/P29752 and previous config saved to /var/cache/conftool/dbconfig/20220615-072352-marostegui.json |
[production] |
07:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1148 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P29751 and previous config saved to /var/cache/conftool/dbconfig/20220615-072034-root.json |
[production] |
07:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P29750 and previous config saved to /var/cache/conftool/dbconfig/20220615-071728-root.json |
[production] |
07:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29749 and previous config saved to /var/cache/conftool/dbconfig/20220615-070847-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P29748 and previous config saved to /var/cache/conftool/dbconfig/20220615-065342-marostegui.json |
[production] |
06:52 |
<XioNoX> |
disable BGP to Telia in eqsin for optic replacement - T300485 |
[production] |
06:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T310011)', diff saved to https://phabricator.wikimedia.org/P29747 and previous config saved to /var/cache/conftool/dbconfig/20220615-063837-marostegui.json |
[production] |
06:02 |
<marostegui> |
Reboot db[2071-2078] T310485 |
[production] |
06:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1105:3311 (T310011)', diff saved to https://phabricator.wikimedia.org/P29746 and previous config saved to /var/cache/conftool/dbconfig/20220615-060153-marostegui.json |
[production] |
06:01 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
06:01 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
05:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1098:3317 (T302659)', diff saved to https://phabricator.wikimedia.org/P29745 and previous config saved to /var/cache/conftool/dbconfig/20220615-054252-marostegui.json |
[production] |
05:42 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance |
[production] |
05:42 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance |
[production] |
05:34 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
05:34 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
05:23 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1173.eqiad.wmnet with OS bullseye |
[production] |
05:17 |
<marostegui> |
dbmaint es5@codfw T310485 |
[production] |
05:17 |
<marostegui> |
dbmaint es4@codfw T310485 |
[production] |
05:17 |
<marostegui> |
dbmaint es3@codfw T310485 |
[production] |
05:17 |
<marostegui> |
dbmaint es2@codfw T310485 |
[production] |
05:17 |
<marostegui> |
dbmaint es1@codfw T310485 |
[production] |
05:07 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage |
[production] |
05:04 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: host reimage |
[production] |
05:03 |
<marostegui> |
Reboot dbproxy1016 and dbproxy1021 T310484 |
[production] |
04:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host db1173.eqiad.wmnet with OS bullseye |
[production] |
02:31 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:30 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
02:30 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:29 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
02:25 |
<tstarling@deploy1002> |
Synchronized php-1.39.0-wmf.16/includes/cache/MessageCache.php: (no justification provided) (duration: 03m 36s) |
[production] |
02:24 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:21 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
02:21 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:17 |
<tstarling@deploy1002> |
Synchronized php-1.39.0-wmf.15/includes/cache/MessageCache.php: T310532 (duration: 03m 29s) |
[production] |
02:17 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
2022-06-14
§
|
23:52 |
<mutante> |
gitlab-runner1001/1002 - clean revert not possible, icinga alerting about failed buildkitd service, manually deleting systemd unit and trying to clean up T308271 |
[production] |
23:49 |
<mutante> |
gitlab-runner1002 - systemctl restart docker; run-puppet-agent ; systemctl start buildkitd - fails though T308271 |
[production] |
23:39 |
<mutante> |
gitlab-runner1001 - systemctl start buildkitd |
[production] |
23:32 |
<mutante> |
gitlab-runner1001 - restarting docker |
[production] |
23:08 |
<mutante> |
disabling puppet in gitlab-runners (via cumin /disable-puppet) before deploying gerrit:791655 to provide gitlab-runners with buildkit and new docker network - T308271 |
[production] |
22:19 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
22:18 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
22:18 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |