2018-03-06
§
|
16:15 |
<madhuvishy> |
Reboot tools-docker-registry-02 T189018 |
[tools] |
15:50 |
<madhuvishy> |
Rebooting tools-worker-1011 |
[tools] |
15:08 |
<chasemp> |
tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs |
[tools] |
15:03 |
<arturo> |
drain and reboot tools-worker-1011 |
[tools] |
15:03 |
<chasemp> |
rebooted tools-worker 1001-1008 |
[tools] |
14:58 |
<arturo> |
drain and reboot tools-worker-1010 |
[tools] |
14:27 |
<chasemp> |
multiple tools running on k8s workers report issues reading replica.my.cnf file atm |
[tools] |
14:27 |
<chasemp> |
reboot tools-worker-100[12] |
[tools] |
14:23 |
<chasemp> |
downtime icinga alert for k8s workers ready |
[tools] |
13:21 |
<arturo> |
T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced |
[tools] |
12:58 |
<arturo> |
T188994 upgrading packages in jessie nodes from the oldstable source |
[tools] |
11:42 |
<arturo> |
clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem |
[tools] |
11:41 |
<arturo> |
aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide |
[tools] |
11:36 |
<arturo> |
(ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) |
[tools] |
11:33 |
<arturo> |
removing unused kernel packages in ubuntu nodes |
[tools] |
11:08 |
<arturo> |
aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster |
[tools] |
2018-02-21
§
|
19:02 |
<bstorm_> |
disabled puppet on tools-static-* pending change 413197 |
[tools] |
18:15 |
<arturo> |
puppet should be fine across the fleet |
[tools] |
17:24 |
<arturo> |
another try: merged https://gerrit.wikimedia.org/r/#/c/413202/ |
[tools] |
17:02 |
<arturo> |
revert last change https://gerrit.wikimedia.org/r/#/c/413198/ |
[tools] |
16:59 |
<arturo> |
puppet is broken across the cluster due to last change |
[tools] |
16:57 |
<arturo> |
deploying https://gerrit.wikimedia.org/r/#/c/410177/ |
[tools] |
16:26 |
<bd808> |
Rebooting tools-docker-registry-01, NFS mounts are in a bad state |
[tools] |
11:43 |
<arturo> |
package upgrades in tools-webgrid-lightttpd-1401 |
[tools] |
11:35 |
<arturo> |
package upgrades in tools-package-builder-01 tools-prometheus-01 tools-static-10 and tools-redis-1001 |
[tools] |
11:22 |
<arturo> |
package upgrades in tools-mail, tools-grid-master, tool-logs-02 |
[tools] |
10:51 |
<arturo> |
package upgrades in tools-checker-01 tools-clushmaster-01 and tools-docker-builder-05 |
[tools] |