tools SAL

901-950 of 2619 results (18ms)

2019-03-25 §
19:51	<bd808>	Shutdown tools-exec-142* (T217152)	[tools]
19:47	<bstorm_>	depooling tools-worker-1025.tools.eqiad.wmflabs because it's not responding and showing insane load	[tools]
19:33	<bd808>	Shutdown tools-exec-141* (T217152)	[tools]
19:31	<bd808>	Shutdown tools-bastion-0{2,3} (T217152)	[tools]
19:19	<bd808>	Shutdown tools-exec-140* (T217152)	[tools]
19:12	<bd808>	Shutdown tools-webgrid-generic-14* (T217152)	[tools]
19:11	<bd808>	Shutdown tools-webgrid-lighttpd-14* (T217152)	[tools]
18:53	<bd808>	Shutdown tools-grid-master (T217152)	[tools]
18:53	<bd808>	Shutdown tools-grid-shadow (T217152)	[tools]
18:49	<bd808>	All jobs still running on the Trusty job grid force deleted.	[tools]
18:46	<bd808>	All Trusty job grid queues marked as disabled. This should stop all new Trusty job submissions.	[tools]
18:43	<arturo>	icinga downtime tools-checker for 24h due to trusty grid shutdown	[tools]
18:39	<bd808>	Shutdown tools-cron-01.tools.eqiad.wmflabs (T217152)	[tools]
15:27	<bd808>	Copied all crontab files still on tools-cron-01 to tool's $HOME/crontab.trusty.save	[tools]
02:34	<bd808>	Disassociated floating IPs and deleted shutdown Trusty grid nodes tools-exec-14{33,34,35,36,37,38,39,40,41,42} (T217152)	[tools]
02:26	<bd808>	Deleted shutdown Trusty grid nodes tools-webgrid-lighttpd-14{20,21,22,24,25,26,27,28} (T217152)	[tools]
2019-03-22 §
17:16	<andrewbogott>	switching all instances to use ldap-ro.eqiad.wikimedia.org as both primary and secondary ldap server	[tools]
16:12	<bstorm_>	cleared errored out stretch grid queues	[tools]
15:56	<bd808>	Rebooting tools-static-12	[tools]
03:09	<bstorm_>	T217280 depooled and rebooted 15 other nodes. Entire stretch grid is in a good state for now.	[tools]
02:31	<bstorm_>	T217280 depooled and rebooted tools-sgeexec-0908 since it had no jobs but very high load from an NFS event that was no longer happening	[tools]
02:09	<bstorm_>	T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0924	[tools]
00:39	<bstorm_>	T217280 depooled and rebooted tools-sgewebgrid-lighttpd-0902	[tools]
2019-03-21 §
23:28	<bstorm_>	T217280 depooled, reloaded and repooled tools-sgeexec-0938	[tools]
21:53	<bstorm_>	T217280 rebooted and cleared "unknown status" from tools-sgeexec-0914 after depooling	[tools]
21:51	<bstorm_>	T217280 rebooted and cleared "unknown status" from tools-sgeexec-0909 after depooling	[tools]
21:26	<bstorm_>	T217280 cleared error state from a couple queues and rebooted tools-sgeexec-0901 and 04 to clear other issues related	[tools]
2019-03-20 §
18:05	<arturo>	depool/reboot/repool tools-sgewebgrid-lighttpd-0904 (hard reboot actually)	[tools]
17:57	<arturo>	depool/reboot/repool tools-sgewebgrid-lighttpd-0904	[tools]
12:23	<arturo>	depool and hard-reboot tools-sgewebgrid-generic-0904 due to extreme load. It doesn't respond to ssh [this one is valid]	[tools]
12:23	<arturo>	last SAL entry is bogus, please ignore it	[tools]
12:22	<arturo>	depool and hard-reboot tools-sgeexec-0938.eqiad.wmflabs due to extreme load. It doesn't respond to ssh	[tools]
12:11	<arturo>	hard-reboot tools-sgeexec-0938.eqiad.wmflabs due to extreme load. It doesn't respond to ssh	[tools]
10:10	<arturo>	manually killing zombie procs in tools-sgewebgrid-lightttpd-0920 (T218546)	[tools]
2019-03-19 §
13:56	<arturo>	T218649 rebooting tools-sgecron-01	[tools]
2019-03-18 §
18:43	<bd808>	Rebooting tools-static-12	[tools]
18:42	<chicocvenancio>	PAWS: 3 nodes still in not ready state, `worker-10(01\|07\|10)` all else working	[tools]
18:41	<chicocvenancio>	PAWS: deleting pods stuck in Unknown state with ` --grace-period=0 --force`	[tools]
18:40	<andrewbogott>	rebooting tools-static-13 in hopes of fixing some nfs mounts	[tools]
18:25	<chicocvenancio>	removing postStart hook for PWB update and restarting hub while gerrit.wikimedia.com is down	[tools]
2019-03-17 §
23:41	<bd808>	Cherry-picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/497210/ as a quick fix for T218494	[tools]
22:30	<bd808>	Investigating strange system state on tools-bastion-03.	[tools]
17:48	<bstorm_>	T218514 rebooting tools-worker-1009 and 1012	[tools]
17:46	<bstorm_>	depooling tools-worker-1009 and tools-worker-1012 for T218514	[tools]
17:13	<bstorm_>	depooled and rebooting tools-worker-1018	[tools]
15:09	<andrewbogott>	running 'killall dpkg and dpkg --configure -a' on all nodes to try to work around a race with initramfs	[tools]
2019-03-16 §
22:34	<bstorm_>	clearing errored out queues again	[tools]
2019-03-15 §
21:08	<bstorm_>	cleared error state on several queues T217280	[tools]
15:58	<gtirloni>	rebooted tools-clushmaster-02	[tools]
14:40	<mutante>	tools-sgebastion-07 - dpkg-reconfigure locales and adding Korean ko_KR.EUC-KR - T130532	[tools]