2020-12-04
§
|
22:23 |
<andrewbogott> |
moving cloudvirt1023 out of the ceph aggregate and into maintenance for T269467 |
[admin] |
21:06 |
<andrewbogott> |
putting cloudvirt1025 and 1026 back into service because I'm pretty sure they're fixed. T269313 |
[admin] |
12:12 |
<arturo> |
manually running `wmcs-purge-backups` again on cloudvirt1024 (T269419) |
[admin] |
11:25 |
<arturo> |
icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269419) |
[admin] |
11:25 |
<arturo> |
last log line referencing cloudvirt1024 is a mistake (T269313) |
[admin] |
11:24 |
<arturo> |
icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269313) |
[admin] |
10:28 |
<arturo> |
manually running `wmcs-purge-backups` on cloudvirt1024 (T269419) |
[admin] |
10:23 |
<arturo> |
setting expiration to 2020-12-03 to the oldest backy snapshot of every VM in cloudvirt1024 (T269419) |
[admin] |
09:54 |
<arturo> |
icinga downtime cloudvirt1025 for 6 days (T269313) |
[admin] |
2020-12-03
§
|
23:21 |
<andrewbogott> |
removing all osds on cloudcephosd1004 for rebuild, T268746 |
[admin] |
21:45 |
<andrewbogott> |
removing all osds on cloudcephosd1005 for rebuild, T268746 |
[admin] |
19:51 |
<andrewbogott> |
removing all osds on cloudcephosd1006 for rebuild, T268746 |
[admin] |
17:01 |
<arturo> |
icinga downtime cloudvirt1025 for 48h to debug network issue T269313 |
[admin] |
16:56 |
<arturo> |
rebooting cloudvirt1025 to debug network issue T269313 |
[admin] |
16:38 |
<dcaro> |
Rimaging cloudvirt1026 (T216195) |
[admin] |
13:24 |
<andrewbogott> |
removing all osds on cloudcephosd1008 for rebuild, T268746 |
[admin] |
02:55 |
<andrewbogott> |
removing all osds on cloudcephosd1009 for rebuild, T268746 |
[admin] |
2020-12-02
§
|
20:03 |
<andrewbogott> |
removing all osds on cloudcephosd1010 for rebuild, T268746 |
[admin] |
17:25 |
<arturo> |
[15:51] failovering neutron virtual router in eqiad1 (T268335) |
[admin] |
15:36 |
<arturo> |
conntrackd is now up and running in cloudnet1003/1004 nodes (T268335) |
[admin] |
15:33 |
<arturo> |
[codfw1dev] conntrackd is now up and running in cloudnet200x-dev nodes (T268335) |
[admin] |
15:08 |
<andrewbogott> |
removing all osds on cloudcephosd1012 for rebuild, T268746 |
[admin] |
12:41 |
<arturo> |
disable puppet in all cloudnet servers to merge conntrackd change T268335 |
[admin] |
11:12 |
<dcaro> |
Reset the properties for the flavor g2.cores8.ram16.disk1120 to correct quotes (T269172) |
[admin] |
09:56 |
<arturo> |
moved cloudvirts 1030, 1029, 1028, 1027, 1026, 1025 away from the 'standard' host aggregate to 'maintenance' (T269172) |
[admin] |
2020-11-25
§
|
19:35 |
<bstorm> |
repairing ceph pg `instructing pg 6.91 on osd.117 to repair` |
[admin] |
09:31 |
<_dcaro> |
The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722) |
[admin] |
08:54 |
<_dcaro> |
Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722) |
[admin] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722) |
[admin] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) |
[admin] |
08:19 |
<_dcaro> |
Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart |
[admin] |