| 
      
        2021-03-25
      
      §
     | 
  
    
  | 19:30 | 
  <bstorm> | 
  forced deletion of all jobs stuck in a deleting state T277653 | 
  [tools] | 
            
  | 17:46 | 
  <arturo> | 
  rebooting tools-sgeexec-* nodes to account for new grid master (T277653) | 
  [tools] | 
            
  | 16:20 | 
  <arturo> | 
  rebuilding tools-sgegrid-master VM as debian buster (T277653) | 
  [tools] | 
            
  | 16:18 | 
  <arturo> | 
  icinga-downtime toolschecker for 2h | 
  [tools] | 
            
  | 16:05 | 
  <bstorm> | 
  failed over the tools grid to the shadow master T277653 | 
  [tools] | 
            
  | 13:36 | 
  <arturo> | 
  shutdown tools-sge-services-03 (T278354) | 
  [tools] | 
            
  | 13:33 | 
  <arturo> | 
  shutdown tools-sge-services-04 (T278354) | 
  [tools] | 
            
  | 13:31 | 
  <arturo> | 
  point aptly clients to `tools-services-05.tools.eqiad1.wikimedia.cloud` (hiera change) (T278354) | 
  [tools] | 
            
  | 12:58 | 
  <arturo> | 
  created VM `tools-services-05` as Debian Buster (T278354) | 
  [tools] | 
            
  | 12:51 | 
  <arturo> | 
  create cinder volume `tools-aptly-data` (T278354) | 
  [tools] | 
            
  
    | 
      
        2021-03-24
      
      §
     | 
  
    
  | 12:46 | 
  <arturo> | 
  shutoff the old stretch VMs `tools-docker-registry-03` and `tools-docker-registry-04` (T278303) | 
  [tools] | 
            
  | 12:38 | 
  <arturo> | 
  associate floating IP 185.15.56.67 with `tools-docker-registry-05` and refresh FQDN docker-registry.tools.wmflabs.org accordingly (T278303) | 
  [tools] | 
            
  | 12:33 | 
  <arturo> | 
  attach cinder volume `tools-docker-registry-data` to VM `tools-docker-registry-05` (T278303) | 
  [tools] | 
            
  | 12:32 | 
  <arturo> | 
  snapshot cinder volume `tools-docker-registry-data` into `tools-docker-registry-data-stretch-migration` (T278303) | 
  [tools] | 
            
  | 12:32 | 
  <arturo> | 
  bump cinder storage quota from 80G to 400G (without quota request task) | 
  [tools] | 
            
  | 12:11 | 
  <arturo> | 
  created VM `tools-docker-registry-06` as Debian Buster (T278303) | 
  [tools] | 
            
  | 12:09 | 
  <arturo> | 
  dettach cinder volume `tools-docker-registry-data` (T278303) | 
  [tools] | 
            
  | 11:46 | 
  <arturo> | 
  attach cinder volume `tools-docker-registry-data` to VM `tools-docker-registry-03` to format it and pre-populate it with registry data (T278303) | 
  [tools] | 
            
  | 11:20 | 
  <arturo> | 
  created 80G cinder volume tools-docker-registry-data (T278303) | 
  [tools] | 
            
  | 11:10 | 
  <arturo> | 
  starting VM tools-docker-registry-04 which was stopped probably since 2021-03-09 due to hypervisor draining | 
  [tools] | 
            
  
    | 
      
        2021-03-18
      
      §
     | 
  
    
  | 19:24 | 
  <bstorm> | 
  set profile::toolforge::infrastructure across the entire project with login_server set on the bastion and exec node-related prefixes | 
  [tools] | 
            
  | 16:21 | 
  <andrewbogott> | 
  enabling puppet tools-wide | 
  [tools] | 
            
  | 16:20 | 
  <andrewbogott> | 
  disabling puppet tools-wide to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456 | 
  [tools] | 
            
  | 16:19 | 
  <bstorm> | 
  added profile::toolforge::infrastructure class to puppetmaster T277756 | 
  [tools] | 
            
  | 04:12 | 
  <bstorm> | 
  rebooted tools-sgeexec-0935.tools.eqiad.wmflabs because it forgot how to LDAP...likely root cause of the issues tonight | 
  [tools] | 
            
  | 03:59 | 
  <bstorm> | 
  rebooting grid master. sorry for the cron spam | 
  [tools] | 
            
  | 03:49 | 
  <bstorm> | 
  restarting sssd on tools-sgegrid-master | 
  [tools] | 
            
  | 03:37 | 
  <bstorm> | 
  deleted a massive number of stuck jobs that misfired from the cron server | 
  [tools] | 
            
  | 03:35 | 
  <bstorm> | 
  rebooting tools-sgecron-01 to try to clear up the ldap-related errors coming out of it | 
  [tools] | 
            
  | 01:46 | 
  <bstorm> | 
  killed the toolschecker cron job, which had an LDAP error, and ran it again by hand | 
  [tools] |