2019-03-29
§
|
21:13 |
<bstorm_> |
depooled tools-sgewebgrid-generic-0903 because of some stuck jobs and odd load characteristics |
[tools] |
21:08 |
<bd808> |
Updated cherry-pick of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500095/ on tools-puppetmaster-01 (T219243) |
[tools] |
20:48 |
<bd808> |
Using root console to fix broken initial puppet run on tools-checker-03. |
[tools] |
20:32 |
<bd808> |
Creating tools-checker-03 with role::wmcs::toolforge::checker (T219243) |
[tools] |
20:24 |
<bd808> |
Cherry-picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/500095/ to tools-puppetmaster-01 for testing (T219243) |
[tools] |
20:22 |
<bd808> |
Disabled puppet on tools-checker-0{1,2} to make testing new role::wmcs::toolforge::checker easier (T219243) |
[tools] |
17:25 |
<bd808> |
Cleared the "Eqw" state of 44 jobs with `qstat -u '*' | grep Eqw | awk '{print $1;}' | xargs -L1 sudo qmod -cj` on tools-sgegrid-master |
[tools] |
17:16 |
<andrewbogott> |
aborted move of tools-static-12; will wait until tomorrow and give DNS caches more time to update |
[tools] |
17:11 |
<bd808> |
Restarted nginx on tools-static-13 |
[tools] |
16:53 |
<andrewbogott> |
moving tools-static-12 to eqiad1-r |
[tools] |
16:49 |
<bstorm_> |
cleared E state from 21 queues |
[tools] |
14:34 |
<andrewbogott> |
moving tools-static.wmflabs.org to point to tools-static-13 in eqiad1-r |
[tools] |
13:54 |
<andrewbogott> |
moving tools-static-13 to eqiad1-r |
[tools] |
2019-03-25
§
|
21:21 |
<bd808> |
All Trusty grid engine hosts shutdown and deleted (T217152) |
[tools] |
21:19 |
<bd808> |
Deleted tools-grid-{master,shadow} (T217152) |
[tools] |
21:18 |
<bd808> |
Deleted tools-webgrid-lighttpd-14* (T217152) |
[tools] |
20:55 |
<bstorm_> |
reboot tools-sgewebgrid-generic-0903 to clear up some issues |
[tools] |
20:52 |
<bstorm_> |
rebooting tools-package-builder-02 due to lots of hung /usr/bin/lsof +c 15 -nXd DEL processes |
[tools] |
20:51 |
<bd808> |
Deleted tools-webgrid-generic-14* (T217152) |
[tools] |
20:49 |
<bd808> |
Deleted tools-exec-143* (T217152) |
[tools] |
20:49 |
<bd808> |
Deleted tools-exec-142* (T217152) |
[tools] |
20:48 |
<bd808> |
Deleted tools-exec-141* (T217152) |
[tools] |
20:47 |
<bd808> |
Deleted tools-exec-140* (T217152) |
[tools] |
20:43 |
<bd808> |
Deleted tools-cron-01 (T217152) |
[tools] |
20:42 |
<bd808> |
Deleted tools-bastion-0{2,3} (T217152) |
[tools] |
20:35 |
<bstorm_> |
rebooted tools-worker-1025 and tools-worker-1021 |
[tools] |
19:59 |
<bd808> |
Shutdown tools-exec-143* (T217152) |
[tools] |
19:51 |
<bd808> |
Shutdown tools-exec-142* (T217152) |
[tools] |
19:47 |
<bstorm_> |
depooling tools-worker-1025.tools.eqiad.wmflabs because it's not responding and showing insane load |
[tools] |
19:33 |
<bd808> |
Shutdown tools-exec-141* (T217152) |
[tools] |
19:31 |
<bd808> |
Shutdown tools-bastion-0{2,3} (T217152) |
[tools] |
19:19 |
<bd808> |
Shutdown tools-exec-140* (T217152) |
[tools] |
19:12 |
<bd808> |
Shutdown tools-webgrid-generic-14* (T217152) |
[tools] |
19:11 |
<bd808> |
Shutdown tools-webgrid-lighttpd-14* (T217152) |
[tools] |
18:53 |
<bd808> |
Shutdown tools-grid-master (T217152) |
[tools] |
18:53 |
<bd808> |
Shutdown tools-grid-shadow (T217152) |
[tools] |
18:49 |
<bd808> |
All jobs still running on the Trusty job grid force deleted. |
[tools] |
18:46 |
<bd808> |
All Trusty job grid queues marked as disabled. This should stop all new Trusty job submissions. |
[tools] |
18:43 |
<arturo> |
icinga downtime tools-checker for 24h due to trusty grid shutdown |
[tools] |