2022-10-04
§
|
07:11 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade |
[production] |
07:11 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade |
[production] |
07:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json |
[production] |
06:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json |
[production] |
06:43 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
06:42 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
06:42 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
06:39 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
06:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json |
[production] |
06:33 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885 |
[production] |
06:32 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.peering with action 'configure' for AS: 25885 |
[production] |
06:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json |
[production] |
06:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json |
[production] |
05:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json |
[production] |
05:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json |
[production] |
03:12 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
03:09 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
03:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
03:07 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
02:31 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:30 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
02:30 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:28 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
02:13 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:09 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
02:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
02:05 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
2022-10-03
§
|
21:45 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:44 |
<robh@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
21:44 |
<robh@cumin2002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye |
[production] |
21:18 |
<robh@cumin2002> |
START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye |
[production] |
19:41 |
<ryankemper> |
[Elastic] Unbanned `elastic1066` |
[production] |
19:37 |
<ryankemper> |
[Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running |
[production] |
19:32 |
<robh> |
msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work |
[production] |
19:25 |
<robh> |
msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet |
[production] |
19:22 |
<ryankemper> |
[Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": "","_name": "elastic1066-production-search-psi-eqiad"}}}'`); will restart elasticsearch-psi after shards drain |
[production] |
19:15 |
<robh@cumin2002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye |
[production] |
18:48 |
<robh@cumin2002> |
START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye |
[production] |
18:41 |
<robh@cumin2002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye |
[production] |
18:34 |
<robh@cumin2002> |
START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye |
[production] |
18:30 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
18:30 |
<bblack@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster |
[production] |
18:21 |
<robh@cumin2002> |
START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
18:12 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
18:06 |
<robh@cumin2002> |
START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
18:04 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
18:00 |
<robh@cumin2002> |
START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
17:52 |
<robh@cumin2002> |
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
17:42 |
<robh@cumin2002> |
START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED |
[production] |
17:41 |
<robh@cumin2002> |
END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003 |
[production] |