| 
      
        2020-12-04
      
      §
     | 
  
    
  | 21:06 | 
  <andrewbogott> | 
  putting cloudvirt1025 and 1026 back into service because I'm pretty sure they're fixed.  T269313 | 
  [admin] | 
            
  | 12:12 | 
  <arturo> | 
  manually running `wmcs-purge-backups` again on cloudvirt1024 (T269419) | 
  [admin] | 
            
  | 11:25 | 
  <arturo> | 
  icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269419) | 
  [admin] | 
            
  | 11:25 | 
  <arturo> | 
  last log line referencing cloudvirt1024 is a mistake (T269313) | 
  [admin] | 
            
  | 11:24 | 
  <arturo> | 
  icinga downtime cloudvirt1024 for 6 days, to avoid paging noises (T269313) | 
  [admin] | 
            
  | 10:28 | 
  <arturo> | 
  manually running `wmcs-purge-backups` on cloudvirt1024 (T269419) | 
  [admin] | 
            
  | 10:23 | 
  <arturo> | 
  setting expiration to 2020-12-03 to the oldest backy snapshot of every VM in cloudvirt1024 (T269419) | 
  [admin] | 
            
  | 09:54 | 
  <arturo> | 
  icinga downtime cloudvirt1025 for 6 days (T269313) | 
  [admin] | 
            
  
    | 
      
        2020-12-03
      
      §
     | 
  
    
  | 23:21 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1004 for rebuild, T268746 | 
  [admin] | 
            
  | 21:45 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1005 for rebuild, T268746 | 
  [admin] | 
            
  | 19:51 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1006 for rebuild, T268746 | 
  [admin] | 
            
  | 17:01 | 
  <arturo> | 
  icinga downtime cloudvirt1025 for 48h to debug network issue T269313 | 
  [admin] | 
            
  | 16:56 | 
  <arturo> | 
  rebooting cloudvirt1025 to debug network issue T269313 | 
  [admin] | 
            
  | 16:38 | 
  <dcaro> | 
  Rimaging cloudvirt1026 (T216195) | 
  [admin] | 
            
  | 13:24 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1008 for rebuild, T268746 | 
  [admin] | 
            
  | 02:55 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1009 for rebuild, T268746 | 
  [admin] | 
            
  
    | 
      
        2020-12-02
      
      §
     | 
  
    
  | 20:03 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1010 for rebuild, T268746 | 
  [admin] | 
            
  | 17:25 | 
  <arturo> | 
  [15:51] failovering neutron virtual router in eqiad1 (T268335) | 
  [admin] | 
            
  | 15:36 | 
  <arturo> | 
  conntrackd is now up and running in cloudnet1003/1004 nodes (T268335) | 
  [admin] | 
            
  | 15:33 | 
  <arturo> | 
  [codfw1dev] conntrackd is now up and running in cloudnet200x-dev nodes (T268335) | 
  [admin] | 
            
  | 15:08 | 
  <andrewbogott> | 
  removing all osds on cloudcephosd1012 for rebuild, T268746 | 
  [admin] | 
            
  | 12:41 | 
  <arturo> | 
  disable puppet in all cloudnet servers to merge conntrackd change T268335 | 
  [admin] | 
            
  | 11:12 | 
  <dcaro> | 
  Reset the properties for the flavor g2.cores8.ram16.disk1120 to correct quotes (T269172) | 
  [admin] | 
            
  | 09:56 | 
  <arturo> | 
  moved cloudvirts 1030, 1029, 1028, 1027, 1026, 1025 away from the 'standard' host aggregate to 'maintenance' (T269172) | 
  [admin] | 
            
  
    | 
      
        2020-11-25
      
      §
     | 
  
    
  | 19:35 | 
  <bstorm> | 
  repairing ceph pg `instructing pg 6.91 on osd.117 to repair` | 
  [admin] | 
            
  | 09:31 | 
  <_dcaro> | 
  The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722) | 
  [admin] | 
            
  | 08:54 | 
  <_dcaro> | 
  Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722) | 
  [admin] | 
            
  | 08:45 | 
  <_dcaro> | 
  Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722) | 
  [admin] | 
            
  | 08:45 | 
  <_dcaro> | 
  Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) | 
  [admin] | 
            
  | 08:19 | 
  <_dcaro> | 
  Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) | 
  [admin] | 
            
  | 08:16 | 
  <_dcaro> | 
  After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service | 
  [admin] | 
            
  | 08:16 | 
  <_dcaro> | 
  After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart | 
  [admin] |