| 
      
        2018-03-06
      
      §
     | 
  
    
  | 16:15 | 
  <madhuvishy> | 
  Reboot tools-docker-registry-02 T189018 | 
  [tools] | 
            
  | 15:50 | 
  <madhuvishy> | 
  Rebooting tools-worker-1011 | 
  [tools] | 
            
  | 15:08 | 
  <chasemp> | 
  tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 15:03 | 
  <arturo> | 
  drain and reboot tools-worker-1011 | 
  [tools] | 
            
  | 15:03 | 
  <chasemp> | 
  rebooted tools-worker 1001-1008 | 
  [tools] | 
            
  | 14:58 | 
  <arturo> | 
  drain and reboot tools-worker-1010 | 
  [tools] | 
            
  | 14:27 | 
  <chasemp> | 
  multiple tools running on k8s workers report issues reading replica.my.cnf file atm | 
  [tools] | 
            
  | 14:27 | 
  <chasemp> | 
  reboot tools-worker-100[12] | 
  [tools] | 
            
  | 14:23 | 
  <chasemp> | 
  downtime icinga alert for k8s workers ready | 
  [tools] | 
            
  | 13:21 | 
  <arturo> | 
  T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced | 
  [tools] | 
            
  | 12:58 | 
  <arturo> | 
  T188994 upgrading packages in jessie nodes from the oldstable source | 
  [tools] | 
            
  | 11:42 | 
  <arturo> | 
  clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem | 
  [tools] | 
            
  | 11:41 | 
  <arturo> | 
  aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide | 
  [tools] | 
            
  | 11:36 | 
  <arturo> | 
  (ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) | 
  [tools] | 
            
  | 11:33 | 
  <arturo> | 
  removing unused kernel packages in ubuntu nodes | 
  [tools] | 
            
  | 11:08 | 
  <arturo> | 
  aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster | 
  [tools] | 
            
  
    | 
      
        2018-02-21
      
      §
     | 
  
    
  | 19:02 | 
  <bstorm_> | 
  disabled puppet on tools-static-* pending change 413197 | 
  [tools] | 
            
  | 18:15 | 
  <arturo> | 
  puppet should be fine across the fleet | 
  [tools] | 
            
  | 17:24 | 
  <arturo> | 
  another try: merged https://gerrit.wikimedia.org/r/#/c/413202/ | 
  [tools] | 
            
  | 17:02 | 
  <arturo> | 
  revert last change https://gerrit.wikimedia.org/r/#/c/413198/ | 
  [tools] | 
            
  | 16:59 | 
  <arturo> | 
  puppet is broken across the cluster due to last change | 
  [tools] | 
            
  | 16:57 | 
  <arturo> | 
  deploying https://gerrit.wikimedia.org/r/#/c/410177/ | 
  [tools] | 
            
  | 16:26 | 
  <bd808> | 
  Rebooting tools-docker-registry-01, NFS mounts are in a bad state | 
  [tools] | 
            
  | 11:43 | 
  <arturo> | 
  package upgrades in tools-webgrid-lightttpd-1401 | 
  [tools] | 
            
  | 11:35 | 
  <arturo> | 
  package upgrades in tools-package-builder-01 tools-prometheus-01 tools-static-10 and tools-redis-1001 | 
  [tools] | 
            
  | 11:22 | 
  <arturo> | 
  package upgrades in tools-mail, tools-grid-master, tool-logs-02 | 
  [tools] | 
            
  | 10:51 | 
  <arturo> | 
  package upgrades in tools-checker-01 tools-clushmaster-01 and tools-docker-builder-05 | 
  [tools] | 
            
  | 09:18 | 
  <chicocvenancio> | 
  killed io intensive tool job in bastion | 
  [tools] |