151-200 of 1180 results (7ms)
2018-01-10 §
13:13 <arturo> detected low space in tools-worker-1020, big files in /var/log due to kubelet issue. Opened T184604 [tools]
2018-01-09 §
23:21 <yuvipanda> paws new cluster master is up, re-adding nodes by executing same sequence of commands for upgrading [tools]
23:08 <yuvipanda> turns out the version of k8s we had wasn't recent enough to support easy upgrades, so destroy entire cluster again and install 1.9.1 [tools]
23:01 <yuvipanda> kill paws master and reboot it [tools]
22:54 <yuvipanda> kill all kube-system pods in paws cluster [tools]
22:54 <yuvipanda> kill all PAWS pods [tools]
22:53 <yuvipanda> redo tools-paws-worker-1006 manually, since clush seems to have missed it for some reason [tools]
22:49 <yuvipanda> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/init-worker.bash' to bring paws workers back up again, but as 1.8 [tools]
22:48 <yuvipanda> run 'clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/install-kubeadm.bash'' to setup kubeadm on all paws worker nodes [tools]
22:46 <yuvipanda> reboot all paws-worker nodes [tools]
22:46 <yuvipanda> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [tools]
22:46 <madhuvishy> run clush -w tools-paws-worker-10[01-20] 'sudo bash /home/yuvipanda/kubeadm-bootstrap/remove-worker.bash' to completely destroy the paws k8s cluster [tools]
21:17 <chasemp> ...rush@tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable && sudo puppet agent --test" [tools]
21:17 <chasemp> tools-clushmaster-01:~$ clush -f 1 -w @k8s-worker "sudo puppet agent --enable --test" [tools]
21:10 <chasemp> tools-k8s-master-01:~# for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016 -e tools-worker-1028 -e tools-worker-1029 `; do kubectl uncordon $n; done [tools]
20:55 <chasemp> for n in `kubectl get nodes | awk '{print $1}' | grep -v -e tools-worker-1001 -e tools-worker-1016`; do kubectl cordon $n; done [tools]
20:51 <chasemp> kubectl cordon tools-worker-1001.tools.eqiad.wmflabs [tools]
20:15 <chasemp> disable puppet on proxies and k8s workers [tools]
19:50 <chasemp> clush -w @all 'sudo puppet agent --test' [tools]
19:42 <chasemp> reboot tools-worker-1010 [tools]
2018-01-08 §
20:34 <madhuvishy> Restart kube services and uncordon tools-worker-1001 [tools]
19:26 <chasemp> sudo service docker restart; sudo service flannel restart; sudo service kube-proxy restart on tools-proxy-02 [tools]
2018-01-06 §
00:35 <madhuvishy> Run `clush -w @paws-worker -b 'sudo iptables -L FORWARD'` [tools]
00:05 <madhuvishy> Drain and cordon tools-worker-1001 (for debugging the dns outage) [tools]
2018-01-05 §
23:49 <madhuvishy> Run clush -w @k8s-worker -x tools-worker-1001.tools.eqiad.wmflabs 'sudo service docker restart; sudo service flannel restart; sudo service kubelet restart; sudo service kube-proxy restart' on tools-clushmaster-01 [tools]
16:22 <andrewbogott> moving tools-worker-1027 to labvirt1015 (CPU balancing) [tools]
16:01 <andrewbogott> moving tools-worker-1017 to labvirt1017 (CPU balancing) [tools]
15:32 <andrewbogott> moving tools-exec-1420.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [tools]
15:18 <andrewbogott> moving tools-exec-1411.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [tools]
15:02 <andrewbogott> moving tools-exec-1440.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [tools]
14:47 <andrewbogott> moving tools-webgrid-lighttpd-1421.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [tools]
14:25 <andrewbogott> moving tools-webgrid-lighttpd-1420.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [tools]
14:05 <andrewbogott> moving tools-webgrid-lighttpd-1417.tools.eqiad.wmflabs to labvirt1015 (CPU balancing) [tools]
13:46 <andrewbogott> moving tools-webgrid-lighttpd-1419.tools.eqiad.wmflabs to labvirt1017 (CPU balancing) [tools]
05:33 <andrewbogott> migrating tools-worker-1012 to labvirt1017 (CPU load balancing) [tools]
2018-01-04 §
17:24 <andrewbogott> rebooting tools-paws-worker-1019 to verify repair of T184018 [tools]
2018-01-03 §
15:38 <bd808> Forced Puppet run on tools-services-01 [tools]
11:29 <arturo> deploy https://gerrit.wikimedia.org/r/#/c/401716/ and https://gerrit.wikimedia.org/r/394101 using clush [tools]
2017-12-31 §
02:00 <bd808> Killed some pwb.py and qacct processes running on tools-bastion-03 [tools]
2017-12-21 §
17:57 <bd808> PAWS: deleted hub-deployment pod stuck in crashloopbackoff [tools]
17:30 <bd808> PAWS: deleting hub-deployment pod. Lots of "Connection pool is full" warnings in pod logs [tools]
2017-12-19 §
21:27 <chasemp> reboot tools-paws-master-01 [tools]
18:38 <andrewbogott> rebooting tools-paws-master-01 [tools]
05:07 <andrewbogott> "service gridengine-master restart" on tools-grid-master [tools]
2017-12-18 §
12:04 <arturo> it seems jupyterhub tries to use a database which doesn't exists: [E 2017-12-18 11:59:49.896 JupyterHub app:904] Failed to connect to db: sqlite:///jupyterhub.sqlite [tools]
11:58 <arturo> The restart didn't work. I could see a lot of log lines in the hub-deployment pod with something like: 2017-12-17 04:08:17,574 WARNING Connection pool is full, discarding connection: 10.96.0.1 [tools]
11:51 <arturo> the restart was with: kubectl get pod -o yaml hub-deployment-1381799904-b5g5j -n prod | kubectl replace --force -f - [tools]
11:50 <arturo> restart pod hub-deployment in paws to try to fix the 502 [tools]
2017-12-15 §
13:55 <arturo> same in tools-checker-02.tools.eqiad.wmflabs [tools]
13:54 <arturo> same in tools-exec-1415.tools.eqiad.wmflabs [tools]