2022-08-21
§
|
21:12 |
<andrewbogott> |
restarted neutron-dhcp-agent on cloudnet1003. it was claiming to be unable to contact Rabbit but seems happy after a restart |
[admin] |
14:36 |
<Krinkle> |
krinkle@mwmaint1002 foreachwikiindblist 'all - small' deleteEqualMessages.php |
[production] |
14:33 |
<Krinkle> |
krinkle@mwmaint1002 foreachwikiindblist 'small - closed' deleteEqualMessages.php |
[production] |
13:07 |
<Reedy> |
looks live various CI jobs (coverage etc) have been stuck for about 8.5 hours |
[releng] |
13:00 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/824862 |
[releng] |
12:36 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6 |
[production] |
12:36 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6 |
[production] |
12:30 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depool 10.6 hosts', diff saved to https://phabricator.wikimedia.org/P32649 and previous config saved to /var/cache/conftool/dbconfig/20220821-123038-ladsgroup.json |
[production] |
12:11 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P32648 and previous config saved to /var/cache/conftool/dbconfig/20220821-121140-root.json |
[production] |
09:27 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32647 and previous config saved to /var/cache/conftool/dbconfig/20220821-092727-ladsgroup.json |
[production] |
09:12 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32646 and previous config saved to /var/cache/conftool/dbconfig/20220821-091221-ladsgroup.json |
[production] |
08:57 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32645 and previous config saved to /var/cache/conftool/dbconfig/20220821-085716-ladsgroup.json |
[production] |
08:42 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32644 and previous config saved to /var/cache/conftool/dbconfig/20220821-084209-ladsgroup.json |
[production] |
04:24 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32643 and previous config saved to /var/cache/conftool/dbconfig/20220821-042415-ladsgroup.json |
[production] |
04:24 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
04:23 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
04:23 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance |
[production] |
04:23 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance |
[production] |
03:30 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32642 and previous config saved to /var/cache/conftool/dbconfig/20220821-033020-ladsgroup.json |
[production] |
03:15 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32641 and previous config saved to /var/cache/conftool/dbconfig/20220821-031514-ladsgroup.json |
[production] |
03:00 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32640 and previous config saved to /var/cache/conftool/dbconfig/20220821-030008-ladsgroup.json |
[production] |
02:45 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32639 and previous config saved to /var/cache/conftool/dbconfig/20220821-024502-ladsgroup.json |
[production] |
01:35 |
<rzl@cumin2002> |
dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P32638 and previous config saved to /var/cache/conftool/dbconfig/20220821-013504-rzl.json |
[production] |
2022-08-20
§
|
23:20 |
<wm-bot> |
<legoktm> Updated mjolnir from 1.2.1 to 1.5.0, now running on node16 |
[tools.mjolnir] |
22:18 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32637 and previous config saved to /var/cache/conftool/dbconfig/20220820-221826-ladsgroup.json |
[production] |
22:18 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |
22:18 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |
17:41 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance |
[production] |
17:40 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance |
[production] |
17:40 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance |
[production] |
17:40 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance |
[production] |
17:37 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32636 and previous config saved to /var/cache/conftool/dbconfig/20220820-173723-ladsgroup.json |
[production] |
17:22 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32635 and previous config saved to /var/cache/conftool/dbconfig/20220820-172217-ladsgroup.json |
[production] |
17:07 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32634 and previous config saved to /var/cache/conftool/dbconfig/20220820-170711-ladsgroup.json |
[production] |
16:52 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32633 and previous config saved to /var/cache/conftool/dbconfig/20220820-165203-ladsgroup.json |
[production] |
11:58 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32632 and previous config saved to /var/cache/conftool/dbconfig/20220820-115816-ladsgroup.json |
[production] |
11:58 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
11:58 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance |
[production] |
11:57 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32631 and previous config saved to /var/cache/conftool/dbconfig/20220820-115755-ladsgroup.json |
[production] |
11:42 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32630 and previous config saved to /var/cache/conftool/dbconfig/20220820-114249-ladsgroup.json |
[production] |
11:27 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32629 and previous config saved to /var/cache/conftool/dbconfig/20220820-112744-ladsgroup.json |
[production] |
11:12 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32628 and previous config saved to /var/cache/conftool/dbconfig/20220820-111238-ladsgroup.json |
[production] |
08:04 |
<dcaro_away> |
after cloudvirt1023 reboot, the vm irc-buster shows as running, but even after restart is not responsive through ssh nor console (T315718) |
[dwl] |
07:55 |
<dcaro_away> |
after cloudvirt1023 reboot, the vm irc-buster does not seem to have rebooted correctly (no ssh, no console), rebooting (T315718) |
[dwl] |
07:44 |
<dcaro_away> |
all k8s nodes ready now \o/ (T315718) |
[tools] |
07:43 |
<dcaro_away> |
rebooted tools-k8s-control-2, seemed stuck trying to wait for tools home (nfs?), after reboot came back up (T315718) |
[tools] |
07:41 |
<dcaro_away> |
cloudvirt1023 down took out 3 workers, 1 control, and a grid exec and a weblight, they are taking long to restart, looking (T315718) |
[tools] |
07:39 |
<dcaro_away> |
cloudvirt1023 is back up, VMs are starting to recover (T315718) |
[admin] |
07:23 |
<dcaro_away> |
cloudvirt1023 seems to have gotten some hardware issue from racadm lclog view "System CPU Resetting.", rebooting and doing memory checks (T315718) |
[admin] |
06:55 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32627 and previous config saved to /var/cache/conftool/dbconfig/20220820-065528-ladsgroup.json |
[production] |