2021-04-28
§
|
10:57 |
<dcaro> |
Got a PG getting stuck on 'remapping' after the OSD came up, had to unset the norebalance and then set it again to get it unstuck (T280641) |
[admin] |
10:34 |
<dcaro> |
Slow/blocked opns from cloudcephmon03, "osd_failure(failed timeout osd.32..." (cloudcephosd1005), unset the cluster noout/norebalance and went away in a few secs, setting it again and continuing... (T280641) |
[admin] |
09:03 |
<dcaro> |
Waiting for slow heartbeats from osd.58(cloudcephosd1002) to recover... (T280641) |
[admin] |
08:59 |
<dcaro> |
During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58, currently on cloudcephosd1002 (T280641) |
[admin] |
08:58 |
<dcaro> |
During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58 (T280641) |
[admin] |
08:58 |
<dcaro> |
During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) (T280641) |
[admin] |
08:21 |
<dcaro> |
Upgrading all the ceph osds on eqiad (T280641) |
[admin] |
08:21 |
<dcaro> |
The clock skew seems intermittent, there's another task to follw it T275860 (T280641) |
[admin] |
08:18 |
<dcaro> |
All equiad ceph mons and mgrs upgraded (T280641) |
[admin] |
08:18 |
<dcaro> |
During the upgrade, ceph detected a clock skew on cloudcephmon1002, cloudcephmon1001, they are back (T280641) |
[admin] |
08:15 |
<dcaro> |
During the upgrade, ceph detected a clock skew on cloudcephmon1002, it went away, I'm guessing systemd-timesyncd fixed it (T280641) |
[admin] |
08:14 |
<dcaro> |
During the upgrade, ceph detected a clock skew on cloudcephmon1002, looking (T280641) |
[admin] |
07:58 |
<dcaro> |
Upgrading ceph services on eqiad, starting with mons/managers (T280641) |
[admin] |
2021-04-27
§
|
14:10 |
<dcaro> |
codfw.openstack upgraded ceph libraries to 15.2.11 (T280641) |
[admin] |
13:07 |
<dcaro> |
codfw.openstack cloudvirt2002-dev done, taking cloudvirt2003-dev out to upgrade ceph libraries (T280641) |
[admin] |
13:00 |
<dcaro> |
codfw.openstack cloudvirt2001-dev back online, taking cloudvirt2002-dev out to upgrade ceph libraries (T280641) |
[admin] |
10:51 |
<dcaro> |
ceph.eqiad: cinder pool got it's pg_num increased to 1024, re-shuffle started (T273783) |
[admin] |
10:48 |
<dcaro> |
ceph.eqiad: Tweaked the target_size_ratio of all the pools, enabling autoscaler (it will increase cinder pool only) (T273783) |
[admin] |
09:14 |
<dcaro> |
manually force stopping the server puppetmaster-01 to unblock migration (in codfw1) |
[admin] |
09:14 |
<dcaro> |
manually force stopping the server puppetmaster-01 to unblock migration |
[admin] |
08:59 |
<dcaro> |
manually force stopping the server exploding-head on codfw, to try cold migration |
[admin] |
08:47 |
<dcaro> |
restarting nova-compute on cloudvirt2001-dev after upgrading ceph libraries to 15.2.11 |
[admin] |
2021-04-13
§
|
16:42 |
<dcaro> |
Ceph balancer got the cluster to eval 0.014916, that is 88-77% usage for compute pool, and 28-19% usage for the cinder one \o/ (T274573) |
[admin] |
15:08 |
<dcaro> |
Activating continuous upmap balancer, keeping a close eye (T274573) |
[admin] |
15:03 |
<dcaro> |
Executing a second pass, there's still movements to improve the eval of 0.030075 (T274573) |
[admin] |
15:02 |
<dcaro> |
First pass finished, improved eval to 0.030075 (T274573) |
[admin] |
14:49 |
<dcaro> |
Running the first_pass balancing plan on ceph eqiad, current eval 0.030622 (T274573) |
[admin] |
14:43 |
<dcaro> |
enabling ceph upmap pg balancer on equiad (T274573) |
[admin] |
14:36 |
<andrewbogott> |
upgrading codfw1dev to version Victoria, T261137 |
[admin] |
13:11 |
<andrewbogott> |
upgrading eqiad1 designate to version Victoria, T261137 |
[admin] |
10:43 |
<dcaro> |
enabled ceph upmap balancer on codfw (T274573,T274573) |
[admin] |