| 2018-03-06
      
      § | 
    
  | 15:50 | <madhuvishy> | Rebooting tools-worker-1011 | [tools] | 
            
  | 15:08 | <chasemp> | tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs | [tools] | 
            
  | 15:03 | <arturo> | drain and reboot tools-worker-1011 | [tools] | 
            
  | 15:03 | <chasemp> | rebooted tools-worker 1001-1008 | [tools] | 
            
  | 14:58 | <arturo> | drain and reboot tools-worker-1010 | [tools] | 
            
  | 14:27 | <chasemp> | multiple tools running on k8s workers report issues reading replica.my.cnf file atm | [tools] | 
            
  | 14:27 | <chasemp> | reboot tools-worker-100[12] | [tools] | 
            
  | 14:23 | <chasemp> | downtime icinga alert for k8s workers ready | [tools] | 
            
  | 13:21 | <arturo> | T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced | [tools] | 
            
  | 12:58 | <arturo> | T188994 upgrading packages in jessie nodes from the oldstable source | [tools] | 
            
  | 11:42 | <arturo> | clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem | [tools] | 
            
  | 11:41 | <arturo> | aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide | [tools] | 
            
  | 11:36 | <arturo> | (ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) | [tools] | 
            
  | 11:33 | <arturo> | removing unused kernel packages in ubuntu nodes | [tools] | 
            
  | 11:08 | <arturo> | aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster | [tools] | 
            
  
    | 2018-02-21
      
      § | 
    
  | 19:02 | <bstorm_> | disabled puppet on tools-static-* pending change 413197 | [tools] | 
            
  | 18:15 | <arturo> | puppet should be fine across the fleet | [tools] | 
            
  | 17:24 | <arturo> | another try: merged https://gerrit.wikimedia.org/r/#/c/413202/ | [tools] | 
            
  | 17:02 | <arturo> | revert last change https://gerrit.wikimedia.org/r/#/c/413198/ | [tools] | 
            
  | 16:59 | <arturo> | puppet is broken across the cluster due to last change | [tools] | 
            
  | 16:57 | <arturo> | deploying https://gerrit.wikimedia.org/r/#/c/410177/ | [tools] | 
            
  | 16:26 | <bd808> | Rebooting tools-docker-registry-01, NFS mounts are in a bad state | [tools] | 
            
  | 11:43 | <arturo> | package upgrades in tools-webgrid-lightttpd-1401 | [tools] | 
            
  | 11:35 | <arturo> | package upgrades in tools-package-builder-01 tools-prometheus-01 tools-static-10 and tools-redis-1001 | [tools] | 
            
  | 11:22 | <arturo> | package upgrades in tools-mail, tools-grid-master, tool-logs-02 | [tools] | 
            
  | 10:51 | <arturo> | package upgrades in tools-checker-01 tools-clushmaster-01 and tools-docker-builder-05 | [tools] | 
            
  | 09:18 | <chicocvenancio> | killed io intensive tool job in bastion | [tools] | 
            
  | 03:32 | <zhuyifei1999_> | removed /data/project/.elasticsearch.ini, owned by root and mode 644, leaks the creds of /data/project/strephit/.elasticsearch.ini Might need to cycle it as well... | [tools] |