2021-12-10
ยง
|
15:01 |
<jelto@deploy1002> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
14:55 |
<moritzm> |
drain primary/secondary instances off ganeti2017 T296622 |
[production] |
14:54 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P18110 and previous config saved to /var/cache/conftool/dbconfig/20211210-145401-marostegui.json |
[production] |
14:50 |
<jelto@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
14:49 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on restbase2026.codfw.wmnet with reason: New cassandra hosts awaiting syncing |
[production] |
14:49 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on restbase2026.codfw.wmnet with reason: New cassandra hosts awaiting syncing |
[production] |
14:48 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti2008.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
14:48 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti2008.codfw.wmnet with reason: Temporarily remove node from Ganeti for reimage |
[production] |
14:48 |
<jelto@deploy1002> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
14:48 |
<jelto@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
14:48 |
<jelto> |
remove tiller from staging-eqiad Kubernetes cluster |
[production] |
14:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1174 (T277354)', diff saved to https://phabricator.wikimedia.org/P18109 and previous config saved to /var/cache/conftool/dbconfig/20211210-143856-marostegui.json |
[production] |
14:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1174 (T277354)', diff saved to https://phabricator.wikimedia.org/P18108 and previous config saved to /var/cache/conftool/dbconfig/20211210-143636-marostegui.json |
[production] |
14:36 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
14:36 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance |
[production] |
14:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18107 and previous config saved to /var/cache/conftool/dbconfig/20211210-143628-marostegui.json |
[production] |
14:36 |
<hnowlan@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS buster |
[production] |
14:33 |
<jelto@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
14:33 |
<jelto@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
14:33 |
<jelto@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
14:33 |
<jelto@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
14:29 |
<jelto> |
remove tiller from staging-codfw Kubernetes cluster |
[production] |
14:28 |
<jelto@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
14:27 |
<jelto@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
14:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18106 and previous config saved to /var/cache/conftool/dbconfig/20211210-142123-marostegui.json |
[production] |
14:17 |
<jelto@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
14:17 |
<jelto@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
14:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P18105 and previous config saved to /var/cache/conftool/dbconfig/20211210-140618-marostegui.json |
[production] |
14:01 |
<jynus> |
increase backup2004's allocated disk space |
[production] |
13:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18104 and previous config saved to /var/cache/conftool/dbconfig/20211210-135114-marostegui.json |
[production] |
13:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1158 (T277354)', diff saved to https://phabricator.wikimedia.org/P18103 and previous config saved to /var/cache/conftool/dbconfig/20211210-134953-marostegui.json |
[production] |
13:49 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance |
[production] |
13:49 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Maintenance |
[production] |
13:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18102 and previous config saved to /var/cache/conftool/dbconfig/20211210-134941-marostegui.json |
[production] |
13:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18101 and previous config saved to /var/cache/conftool/dbconfig/20211210-133437-marostegui.json |
[production] |
13:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P18100 and previous config saved to /var/cache/conftool/dbconfig/20211210-131932-marostegui.json |
[production] |
13:17 |
<hnowlan@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS buster |
[production] |
13:04 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18099 and previous config saved to /var/cache/conftool/dbconfig/20211210-130427-marostegui.json |
[production] |
13:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1127 (T277354)', diff saved to https://phabricator.wikimedia.org/P18098 and previous config saved to /var/cache/conftool/dbconfig/20211210-130051-marostegui.json |
[production] |
13:00 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
13:00 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 4:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
12:56 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on restbase[2024-2025].codfw.wmnet with reason: New cassandra hosts awaiting syncing |
[production] |
12:56 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on restbase[2024-2025].codfw.wmnet with reason: New cassandra hosts awaiting syncing |
[production] |
12:53 |
<hnowlan@cumin2002> |
START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS buster |
[production] |
12:51 |
<hnowlan@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS buster |
[production] |
12:37 |
<hnowlan> |
including cassandra-tools in cassandra311 component of buster-wikimedia |
[production] |
12:31 |
<_joe_> |
manually modifying configmaps for rsyslog in mwdebug for live troubleshooting. |
[production] |
12:28 |
<hnowlan@cumin2002> |
START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS buster |
[production] |
12:08 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:03 |
<oblivian@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |