|
2021-11-26
§
|
| 16:11 |
<arnoldokoth> |
drain kubestage1002 node in prep for decommissioning |
[production] |
| 16:05 |
<arnoldokoth> |
drain kubestage1001 node in prep for decommissioning |
[production] |
| 15:46 |
<elukey> |
move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space |
[production] |
| 14:30 |
<jelto@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' . |
[production] |
| 14:25 |
<jelto@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' . |
[production] |
| 14:21 |
<jelto@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . |
[production] |
| 14:15 |
<hashar> |
deployment-prep: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ database updating job is broken since 6:20 UTC due to a segmentation fault | T296539 |
[releng] |
| 14:13 |
<wm-bot> |
<chicocvenancio> Kick bridgebot to see if it stops duplication in telegram |
[tools.bridgebot] |
| 13:51 |
<Amir1> |
running T286552 schema changes in the cloud |
[mailman] |
| 13:48 |
<jelto@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . |
[production] |
| 13:46 |
<jelto@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' . |
[production] |
| 13:25 |
<akosiaris@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
| 13:25 |
<akosiaris@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
| 12:21 |
<vgutierrez> |
restarting HAProxy on O:cache::upload_haproxy - T290005 |
[production] |
| 11:41 |
<akosiaris> |
T296303 cleanup weird state of calico-codfw cluster |
[production] |
| 11:41 |
<akosiaris@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
| 11:41 |
<akosiaris@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
| 11:39 |
<akosiaris@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
| 11:25 |
<vgutierrez> |
restarting HAProxy on O:cache::(text|upload)_haproxy - T290005 |
[production] |
| 10:23 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json |
[production] |
| 10:17 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1111 (T296274)', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json |
[production] |
| 10:17 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274 |
[production] |
| 10:17 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance T296274 |
[production] |
| 10:14 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repool after fixing users T296274', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json |
[production] |
| 10:05 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1177 (T296274)', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json |
[production] |
| 10:05 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274 |
[production] |
| 10:05 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance T296274 |
[production] |
| 10:04 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143 |
[production] |
| 10:04 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance T296143 |
[production] |
| 08:28 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json |
[production] |
| 08:13 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json |
[production] |
| 07:58 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json |
[production] |
| 07:43 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'After maintenance db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json |
[production] |
| 07:07 |
<majavah> |
hard reboot deployment-mwmaint02 |
[releng] |
| 06:28 |
<Amir1> |
killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint |
[production] |
| 06:19 |
<Amir1> |
killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago |
[production] |
|
2021-11-25
§
|
| 21:37 |
<chicocvenancio> |
rollback singleuser to PR #96 T295257 |
[paws] |
| 21:34 |
<wm-bot> |
<lucaswerkmeister> deployed baef3a16f6 (l10n updates) |
[tools.lexeme-forms] |
| 21:15 |
<chicocvenancio> |
deploy PR #110 changing singleuser to bump openrefine version T295257 |
[paws] |
| 20:43 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1160 (T296143)', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json |
[production] |
| 20:43 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance T296143 |
[production] |
| 20:43 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance T296143 |
[production] |
| 19:28 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance T296143 |
[production] |
| 19:28 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance T296143 |
[production] |