2018-01-16
§
|
17:48 |
<andrewbogott> |
depooling tools-exec-1402, 1426, 1429, 1433, tools-webgrid-lighttpd-1408, 1414, 1424 |
[tools] |
17:28 |
<andrewbogott> |
disabling tools-webgrid-generic-1402, tools-webgrid-lighttpd-1403, tools-exec-1403 for host reboot |
[tools] |
17:26 |
<andrewbogott> |
repooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 after host reboot |
[tools] |
17:08 |
<andrewbogott> |
depooling tools-exec-1405, 1425, tools-webgrid-generic-1403, tools-webgrid-lighttpd-1401, 1405 for host reboot |
[tools] |
16:19 |
<andrewbogott> |
repooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 after host reboot |
[tools] |
15:52 |
<andrewbogott> |
depooling tools-exec-1401, 1407, 1408, 1430, 1431, 1432, 1435, 1438, 1439, 1441, tools-webgrid-lighttpd-1402, 1407 for host reboot |
[tools] |
13:35 |
<chasemp> |
tools-mail almouked@ltnet.net 719 pending messages cleared |
[tools] |
2018-01-11
§
|
20:33 |
<andrewbogott> |
repooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 |
[tools] |
20:33 |
<andrewbogott> |
uncordoning tools-worker-1012 and tools-worker-1017 |
[tools] |
20:06 |
<andrewbogott> |
cordoning tools-worker-1012 and tools-worker-1017 |
[tools] |
20:02 |
<andrewbogott> |
depooling tools-exec-1411, tools-exec-1440, tools-webgrid-lighttpd-1419, tools-webgrid-lighttpd-1420, tools-webgrid-lighttpd-1421 |
[tools] |
19:00 |
<chasemp> |
reboot tools-worker-1015 |
[tools] |
15:08 |
<chasemp> |
reboot tools-exec-1405 |
[tools] |
15:06 |
<chasemp> |
reboot tools-exec-1404 |
[tools] |
15:06 |
<chasemp> |
reboot tools-exec-1403 |
[tools] |
15:02 |
<chasemp> |
reboot tools-exec-1402 |
[tools] |
14:57 |
<chasemp> |
reboot tools-exec-1401 again... |
[tools] |
14:53 |
<chasemp> |
reboot tools-exec-1401 |
[tools] |
14:46 |
<chasemp> |
install metltdown kernel and reboot workers 1011-1016 as jessie pilot |
[tools] |
2018-01-10
§
|
15:14 |
<chasemp> |
tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" |
[tools] |
15:03 |
<chasemp> |
tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1016`; do kubectl cordon $n; done |
[tools] |
14:41 |
<chasemp> |
tools-clushmaster-01:~$ clush -w @k8s-worker "sudo puppet agent --disable 'chase rollout'" |
[tools] |
14:01 |
<chasemp> |
tools-k8s-master-01:~# kubectl uncordon tools-worker-1001.tools.eqiad.wmflabs |
[tools] |
13:57 |
<arturo> |
T184604 cleaned stalled log files that prevented logrotate from working. Triggered a couple of logrorate runs by hand in tools-worker-1020.tools.eqiad.wmflabs |
[tools] |
13:46 |
<arturo> |
T184604 aborrero@tools-k8s-master-01:~$ sudo kubectl uncordon tools-worker-1020.tools.eqiad.wmflabs |
[tools] |
13:45 |
<arturo> |
T184604 aborrero@tools-worker-1020:/var/log$ sudo mkdir /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes |
[tools] |
13:26 |
<arturo> |
sudo kubectl drain tools-worker-1020.tools.eqiad.wmflabs |
[tools] |
13:22 |
<arturo> |
empty by hand syslog and daemon.log files. They are so big that logrotate won't handle them |
[tools] |
13:20 |
<arturo> |
aborrero@tools-worker-1020:~$ sudo service kubelet restart |
[tools] |
13:18 |
<arturo> |
aborrero@tools-k8s-master-01:~$ sudo kubectl cordon tools-worker-1020.tools.eqiad.wmflabs for T184604 |
[tools] |
13:13 |
<arturo> |
detected low space in tools-worker-1020, big files in /var/log due to kubelet issue. Opened T184604 |
[tools] |
2018-01-09
§
|
23:21 |
<yuvipanda> |
paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading |
[tools] |
23:08 |
<yuvipanda> |
turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 |
[tools] |
23:01 |
<yuvipanda> |
kill paws master and reboot it |
[tools] |
22:54 |
<yuvipanda> |
kill all kube-system pods in paws cluster |
[tools] |
22:54 |
<yuvipanda> |
kill all PAWS pods |
[tools] |
22:53 |
<yuvipanda> |
redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason |
[tools] |
22:49 |
<yuvipanda> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 |
[tools] |
22:48 |
<yuvipanda> |
run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes |
[tools] |
22:46 |
<yuvipanda> |
reboot all paws-worker nodes |
[tools] |
22:46 |
<yuvipanda> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster |
[tools] |
22:46 |
<madhuvishy> |
run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster |
[tools] |
21:17 |
<chasemp> |
...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" |
[tools] |
21:17 |
<chasemp> |
tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" |
[tools] |
21:10 |
<chasemp> |
tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done |
[tools] |
20:55 |
<chasemp> |
for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016`; do kubectl cordon $n; done |
[tools] |
20:51 |
<chasemp> |
kubectl cordon tools-worker-1001.tools.eqiad.wmflabs |
[tools] |
20:15 |
<chasemp> |
disable puppet on proxies and k8s workers |
[tools] |
19:50 |
<chasemp> |
clush -w @all 'sudo puppet agent --test' |
[tools] |
19:42 |
<chasemp> |
reboot tools-worker-1010 |
[tools] |