2022-08-20
§
|
11:57 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32631 and previous config saved to /var/cache/conftool/dbconfig/20220820-115755-ladsgroup.json |
[production] |
11:42 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32630 and previous config saved to /var/cache/conftool/dbconfig/20220820-114249-ladsgroup.json |
[production] |
11:27 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32629 and previous config saved to /var/cache/conftool/dbconfig/20220820-112744-ladsgroup.json |
[production] |
11:12 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32628 and previous config saved to /var/cache/conftool/dbconfig/20220820-111238-ladsgroup.json |
[production] |
08:04 |
<dcaro_away> |
after cloudvirt1023 reboot, the vm irc-buster shows as running, but even after restart is not responsive through ssh nor console (T315718) |
[dwl] |
07:55 |
<dcaro_away> |
after cloudvirt1023 reboot, the vm irc-buster does not seem to have rebooted correctly (no ssh, no console), rebooting (T315718) |
[dwl] |
07:44 |
<dcaro_away> |
all k8s nodes ready now \o/ (T315718) |
[tools] |
07:43 |
<dcaro_away> |
rebooted tools-k8s-control-2, seemed stuck trying to wait for tools home (nfs?), after reboot came back up (T315718) |
[tools] |
07:41 |
<dcaro_away> |
cloudvirt1023 down took out 3 workers, 1 control, and a grid exec and a weblight, they are taking long to restart, looking (T315718) |
[tools] |
07:39 |
<dcaro_away> |
cloudvirt1023 is back up, VMs are starting to recover (T315718) |
[admin] |
07:23 |
<dcaro_away> |
cloudvirt1023 seems to have gotten some hardware issue from racadm lclog view "System CPU Resetting.", rebooting and doing memory checks (T315718) |
[admin] |
06:55 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32627 and previous config saved to /var/cache/conftool/dbconfig/20220820-065528-ladsgroup.json |
[production] |
06:55 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance |
[production] |
06:55 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance |
[production] |
06:55 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32626 and previous config saved to /var/cache/conftool/dbconfig/20220820-065507-ladsgroup.json |
[production] |
06:40 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32625 and previous config saved to /var/cache/conftool/dbconfig/20220820-064001-ladsgroup.json |
[production] |
06:24 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32624 and previous config saved to /var/cache/conftool/dbconfig/20220820-062455-ladsgroup.json |
[production] |
06:09 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32623 and previous config saved to /var/cache/conftool/dbconfig/20220820-060949-ladsgroup.json |
[production] |
01:26 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32622 and previous config saved to /var/cache/conftool/dbconfig/20220820-012602-ladsgroup.json |
[production] |
01:25 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance |
[production] |
01:25 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance |
[production] |
2022-08-19
§
|
23:37 |
<dzahn@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup |
[production] |
23:37 |
<dzahn@cumin2002> |
START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup |
[production] |
23:35 |
<mutante> |
phab2002 - service phd: stopped phabricator_logmail: disabled, phabricator dumps: disabled, systemd::sysuser: not used (all via Hiera switches) - T280597 |
[production] |
23:33 |
<mutante> |
phab2002 - re-enabled puppet, sshd config ListenAddress fixed by puppet gerrit:824797 - now has phabricator prod role but without LVS/git-ssh - no more error in puppet run - T280597 |
[production] |
23:04 |
<TheresNoTime> |
resized deployment-mwlog01's /srv volume, restarted |
[releng] |
23:02 |
<mutante> |
phab2002 - disable puppet, fix sshd_config, restart sshd |
[production] |
22:57 |
<TheresNoTime> |
shutting down deployment-mwlog01 for T315707 |
[releng] |
20:23 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
20:23 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance |
[production] |
18:50 |
<wm-bot> |
<legoktm> Updated to HEAD |
[tools.ls] |
18:29 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
18:28 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance |
[production] |
18:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32621 and previous config saved to /var/cache/conftool/dbconfig/20220819-182835-marostegui.json |
[production] |
18:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32620 and previous config saved to /var/cache/conftool/dbconfig/20220819-181329-marostegui.json |
[production] |
17:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32619 and previous config saved to /var/cache/conftool/dbconfig/20220819-175823-marostegui.json |
[production] |
17:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32618 and previous config saved to /var/cache/conftool/dbconfig/20220819-174317-marostegui.json |
[production] |
17:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32617 and previous config saved to /var/cache/conftool/dbconfig/20220819-171052-marostegui.json |
[production] |
17:10 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance |
[production] |
17:10 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance |
[production] |
17:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32616 and previous config saved to /var/cache/conftool/dbconfig/20220819-171031-marostegui.json |
[production] |
17:06 |
<taavi> |
[codfw1dev] restart mariadb on clouddb2002-dev to pick up certificate config changes T310795 |
[admin] |
16:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32615 and previous config saved to /var/cache/conftool/dbconfig/20220819-165525-marostegui.json |
[production] |
16:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32614 and previous config saved to /var/cache/conftool/dbconfig/20220819-164019-marostegui.json |
[production] |
16:35 |
<wm-bot> |
<lucaswerkmeister> Double IRC messages to other bridges |
[tools.bridgebot] |
16:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32613 and previous config saved to /var/cache/conftool/dbconfig/20220819-162513-marostegui.json |
[production] |
16:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32612 and previous config saved to /var/cache/conftool/dbconfig/20220819-162253-marostegui.json |
[production] |
16:22 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
16:22 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance |
[production] |
16:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32611 and previous config saved to /var/cache/conftool/dbconfig/20220819-162232-marostegui.json |
[production] |