| 
      
        2018-01-16
      
      §
     | 
  
    
  | 18:26 | 
  <andrewbogott> | 
  repooling tools-exec-1404 and 1434 for host reboot | 
  [tools] | 
            
  | 18:06 | 
  <andrewbogott> | 
  depooling tools-exec-1404 and 1434 for host reboot | 
  [tools] | 
            
  | 18:04 | 
  <andrewbogott> | 
  repooling tools-exec-1402, 1426, 1429, 1433, tools-webgrid-lighttpd-1408, 1414, 1424 | 
  [tools] | 
            
  | 17:48 | 
  <andrewbogott> | 
  depooling tools-exec-1402, 1426, 1429, 1433, tools-webgrid-lighttpd-1408, 1414, 1424 | 
  [tools] | 
            
  | 17:28 | 
  <andrewbogott> | 
  disabling tools-webgrid-generic-1402, tools-webgrid-lighttpd-1403, tools-exec-1403 for host reboot | 
  [tools] | 
            
  | 17:26 | 
  <andrewbogott> | 
  repooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 after host reboot | 
  [tools] | 
            
  | 17:08 | 
  <andrewbogott> | 
  depooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 for host reboot | 
  [tools] | 
            
  | 16:19 | 
  <andrewbogott> | 
  repooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 after host reboot | 
  [tools] | 
            
  | 15:52 | 
  <andrewbogott> | 
  depooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 for host reboot | 
  [tools] | 
            
  | 13:35 | 
  <chasemp> | 
  tools-mail  almouked@ltnet.net 719 pending messages cleared | 
  [tools] | 
            
  
    | 
      
        2018-01-11
      
      §
     | 
  
    
  | 20:33 | 
  <andrewbogott> | 
  repooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 | 
  [tools] | 
            
  | 20:33 | 
  <andrewbogott> | 
  uncordoning tools-worker-1012 and tools-worker-1017 | 
  [tools] | 
            
  | 20:06 | 
  <andrewbogott> | 
  cordoning tools-worker-1012 and tools-worker-1017 | 
  [tools] | 
            
  | 20:02 | 
  <andrewbogott> | 
  depooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 | 
  [tools] | 
            
  | 19:00 | 
  <chasemp> | 
  reboot tools-worker-1015 | 
  [tools] | 
            
  | 15:08 | 
  <chasemp> | 
  reboot tools-exec-1405 | 
  [tools] | 
            
  | 15:06 | 
  <chasemp> | 
  reboot tools-exec-1404 | 
  [tools] | 
            
  | 15:06 | 
  <chasemp> | 
  reboot tools-exec-1403 | 
  [tools] | 
            
  | 15:02 | 
  <chasemp> | 
  reboot tools-exec-1402 | 
  [tools] | 
            
  | 14:57 | 
  <chasemp> | 
  reboot tools-exec-1401 again... | 
  [tools] | 
            
  | 14:53 | 
  <chasemp> | 
  reboot tools-exec-1401 | 
  [tools] | 
            
  | 14:46 | 
  <chasemp> | 
  install metltdown kernel and reboot workers 1011-1016 as jessie pilot | 
  [tools] | 
            
  
    | 
      
        2018-01-10
      
      §
     | 
  
    
  | 15:14 | 
  <chasemp> | 
  tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" | 
  [tools] | 
            
  | 15:03 | 
  <chasemp> | 
  tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1016`; do kubectl cordon $n; done | 
  [tools] | 
            
  | 14:41 | 
  <chasemp> | 
  tools-clushmaster-01:~$ clush -w @k8s-worker "sudo puppet agent --disable 'chase rollout'" | 
  [tools] | 
            
  | 14:01 | 
  <chasemp> | 
  tools-k8s-master-01:~# kubectl uncordon tools-worker-1001.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 13:57 | 
  <arturo> | 
  T184604 cleaned stalled log files that prevented logrotate from working. Triggered a couple of logrorate runs by hand in tools-worker-1020.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 13:46 | 
  <arturo> | 
  T184604 aborrero@tools-k8s-master-01:~$ sudo kubectl uncordon tools-worker-1020.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 13:45 | 
  <arturo> | 
  T184604 aborrero@tools-worker-1020:/var/log$ sudo mkdir /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes | 
  [tools] | 
            
  | 13:26 | 
  <arturo> | 
  sudo kubectl drain tools-worker-1020.tools.eqiad.wmflabs | 
  [tools] | 
            
  | 13:22 | 
  <arturo> | 
  empty by hand syslog and daemon.log files. They are so big that logrotate won't handle them | 
  [tools] | 
            
  | 13:20 | 
  <arturo> | 
  aborrero@tools-worker-1020:~$ sudo service kubelet restart | 
  [tools] | 
            
  | 13:18 | 
  <arturo> | 
  aborrero@tools-k8s-master-01:~$ sudo kubectl cordon tools-worker-1020.tools.eqiad.wmflabs for T184604 | 
  [tools] | 
            
  | 13:13 | 
  <arturo> | 
  detected low space in tools-worker-1020, big files in /var/log due to kubelet issue. Opened T184604 | 
  [tools] | 
            
  
    | 
      
        2018-01-09
      
      §
     | 
  
    
  | 23:21 | 
  <yuvipanda> | 
  paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading | 
  [tools] | 
            
  | 23:08 | 
  <yuvipanda> | 
  turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 | 
  [tools] | 
            
  | 23:01 | 
  <yuvipanda> | 
  kill paws master and reboot it | 
  [tools] | 
            
  | 22:54 | 
  <yuvipanda> | 
  kill all kube-system pods in paws cluster | 
  [tools] | 
            
  | 22:54 | 
  <yuvipanda> | 
  kill all PAWS pods | 
  [tools] | 
            
  | 22:53 | 
  <yuvipanda> | 
  redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason | 
  [tools] | 
            
  | 22:49 | 
  <yuvipanda> | 
  run  clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 | 
  [tools] | 
            
  | 22:48 | 
  <yuvipanda> | 
  run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes | 
  [tools] | 
            
  | 22:46 | 
  <yuvipanda> | 
  reboot all paws-worker nodes | 
  [tools] | 
            
  | 22:46 | 
  <yuvipanda> | 
  run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster | 
  [tools] | 
            
  | 22:46 | 
  <madhuvishy> | 
  run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster | 
  [tools] | 
            
  | 21:17 | 
  <chasemp> | 
  ...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" | 
  [tools] | 
            
  | 21:17 | 
  <chasemp> | 
  tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" | 
  [tools] | 
            
  | 21:10 | 
  <chasemp> | 
  tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001  -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done | 
  [tools] | 
            
  | 20:55 | 
  <chasemp> | 
  for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001  -e tools-worker-1016`; do kubectl cordon $n; done | 
  [tools] | 
            
  | 20:51 | 
  <chasemp> | 
  kubectl cordon tools-worker-1001.tools.eqiad.wmflabs | 
  [tools] |