151-200 of 1879 results (10ms)
2019-03-25 §
20:52 <bstorm_> rebooting tools-package-builder-02 due to lots of hung /usr/bin/lsof +c 15 -nXd DEL processes [tools]
20:51 <bd808> Deleted tools-webgrid-generic-14* (T217152) [tools]
20:49 <bd808> Deleted tools-exec-143* (T217152) [tools]
20:49 <bd808> Deleted tools-exec-142* (T217152) [tools]
20:48 <bd808> Deleted tools-exec-141* (T217152) [tools]
20:47 <bd808> Deleted tools-exec-140* (T217152) [tools]
20:43 <bd808> Deleted tools-cron-01 (T217152) [tools]
20:42 <bd808> Deleted tools-bastion-0{2,3} (T217152) [tools]
20:35 <bstorm_> rebooted tools-worker-1025 and tools-worker-1021 [tools]
19:59 <bd808> Shutdown tools-exec-143* (T217152) [tools]
19:51 <bd808> Shutdown tools-exec-142* (T217152) [tools]
19:47 <bstorm_> depooling tools-worker-1025.tools.eqiad.wmflabs because it's not responding and showing insane load [tools]
19:33 <bd808> Shutdown tools-exec-141* (T217152) [tools]
19:31 <bd808> Shutdown tools-bastion-0{2,3} (T217152) [tools]
19:19 <bd808> Shutdown tools-exec-140* (T217152) [tools]
19:12 <bd808> Shutdown tools-webgrid-generic-14* (T217152) [tools]
19:11 <bd808> Shutdown tools-webgrid-lighttpd-14* (T217152) [tools]
18:53 <bd808> Shutdown tools-grid-master (T217152) [tools]
18:53 <bd808> Shutdown tools-grid-shadow (T217152) [tools]
18:49 <bd808> All jobs still running on the Trusty job grid force deleted. [tools]
18:46 <bd808> All Trusty job grid queues marked as disabled. This should stop all new Trusty job submissions. [tools]
18:43 <arturo> icinga downtime tools-checker for 24h due to trusty grid shutdown [tools]
18:39 <bd808> Shutdown tools-cron-01.tools.eqiad.wmflabs (T217152) [tools]
15:27 <bd808> Copied all crontab files still on tools-cron-01 to tool's $HOME/crontab.trusty.save [tools]
02:34 <bd808> Disassociated floating IPs and deleted shutdown Trusty grid nodes tools-exec-14{33,34,35,36,37,38,39,40,41,42} (T217152) [tools]
02:26 <bd808> Deleted shutdown Trusty grid nodes tools-webgrid-lighttpd-14{20,21,22,24,25,26,27,28} (T217152) [tools]
2019-03-22 §
17:16 <andrewbogott> switching all instances to use ldap-ro.eqiad.wikimedia.org as both primary and secondary ldap server [tools]
16:12 <bstorm_> cleared errored out stretch grid queues [tools]
15:56 <bd808> Rebooting tools-static-12 [tools]
03:09 <bstorm_> T217280 depooled and rebooted 15 other nodes. Entire stretch grid is in a good state for now. [tools]
02:31 <bstorm_> T217280 depooled and rebooted tools-sgeexec-0908 since it had no jobs but very high load from an NFS event that was no longer happening [tools]
02:09 <bstorm_> T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0924 [tools]
00:39 <bstorm_> T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0902 [tools]
2019-03-21 §
23:28 <bstorm_> T217280 depooled, reloaded and repooled tools-sgeexec-0938 [tools]
21:53 <bstorm_> T217280 rebooted and cleared "unknown status" from tools-sgeexec-0914 after depooling [tools]
21:51 <bstorm_> T217280 rebooted and cleared "unknown status" from tools-sgeexec-0909 after depooling [tools]
21:26 <bstorm_> T217280 cleared error state from a couple queues and rebooted tools-sgeexec-0901 and 04 to clear other issues related [tools]
2019-03-20 §
18:05 <arturo> depool/reboot/repool tools-sgewebgrid-lighttpd-0904 (hard reboot actually) [tools]
17:57 <arturo> depool/reboot/repool tools-sgewebgrid-lighttpd-0904 [tools]
12:23 <arturo> depool and hard-reboot tools-sgewebgrid-generic-0904 due to extreme load. It doesn't respond to ssh [this one is valid] [tools]
12:23 <arturo> last SAL entry is bogus, please ignore it [tools]
12:22 <arturo> depool and hard-reboot tools-sgeexec-0938.eqiad.wmflabs due to extreme load. It doesn't respond to ssh [tools]
12:11 <arturo> hard-reboot tools-sgeexec-0938.eqiad.wmflabs due to extreme load. It doesn't respond to ssh [tools]
10:10 <arturo> manually killing zombie procs in tools-sgewebgrid-lightttpd-0920 (T218546) [tools]
2019-03-19 §
13:56 <arturo> T218649 rebooting tools-sgecron-01 [tools]
2019-03-18 §
18:43 <bd808> Rebooting tools-static-12 [tools]
18:42 <chicocvenancio> PAWS: 3 nodes still in not ready state, `worker-10(01|07|10)` all else working [tools]
18:41 <chicocvenancio> PAWS: deleting pods stuck in Unknown state with ` --grace-period=0 --force` [tools]
18:40 <andrewbogott> rebooting tools-static-13 in hopes of fixing some nfs mounts [tools]
18:25 <chicocvenancio> removing postStart hook for PWB update and restarting hub while gerrit.wikimedia.com is down [tools]