551-600 of 803 results (12ms)
2016-01-21 §
21:13 <YuviPanda> repooled exec nodes on labvirt1010 [tools]
21:08 <YuviPanda> gridengine-master started, verified shadow hasn't started [tools]
21:00 <YuviPanda> stop gridengine master [tools]
20:51 <YuviPanda> repooled exec nodes on labvirt1007 was last message [tools]
20:51 <YuviPanda> repooled exec nodes on labvirt1006 [tools]
20:39 <YuviPanda> failover tools-static too tools-web-static-01 [tools]
20:38 <YuviPanda> failover tools-checker to tools-checker-01 [tools]
20:32 <YuviPanda> depooled exec nodes on 1007 [tools]
20:32 <YuviPanda> repooled exec nodes on 1006 [tools]
20:14 <YuviPanda> depooled all exec nodes in labvirt1006 [tools]
20:11 <YuviPanda> repooled exec node son 1005 [tools]
19:53 <YuviPanda> depooled exec nodes on labvirt1005 [tools]
19:49 <YuviPanda> repooled exec nodes from labvirt1004 [tools]
19:48 <YuviPanda> failed over proxy to tools-proxy-01 again [tools]
19:31 <YuviPanda> depooled exec nodes from labvirt1004 [tools]
19:29 <YuviPanda> repooled exec nodes from labvirt1003 [tools]
19:13 <YuviPanda> depooled instances on labvirt1003 [tools]
19:06 <YuviPanda> re-enabled queues on exec nodes that were on labvirt1002 [tools]
19:02 <YuviPanda> failed over tools proxy to tools-proxy-02 [tools]
18:46 <YuviPanda> drained and disabled queues on all nodes on labvirt1002 [tools]
18:38 <YuviPanda> restarted all restartable jobs in instances on labvirt1001 and deleted all non-restartable ghost jobs. these were already dead [tools]
2016-01-20 §
14:50 <chasemp> reboot tools-webgrid-lighttpd-1209 as frozen [tools]
2016-01-15 §
18:34 <chasemp> tools-mail-01 is locked up I am rebooting [tools]
2016-01-14 §
01:56 <YuviPanda> rm service.manifest for wikiviewstats to prevent it from constantly trying to start up and fail webservice [tools]
01:32 <YuviPanda> stopped erwin85's tools since it was causing replag on labsdb1002 [tools]
2016-01-11 §
22:19 <valhallasw`cloud> reset maxujobs 0->128, job_load_adjustments none->np_load_avg=0.50, load_ad... -> 0:7:30 [tools]
22:12 <YuviPanda> restarted gridengine master again [tools]
22:07 <valhallasw`cloud> set job_load_adjustments from np_load_avg=0.50 to none and load_adjustment_decay_time to 0:0:0 [tools]
22:05 <valhallasw`cloud> set maxujobs back to 0, but doesn't help [tools]
21:57 <valhallasw`cloud> reset to 7:30 [tools]
21:57 <valhallasw`cloud> that cleared the measure, but jobs still not starting. Ugh! [tools]
21:55 <valhallasw`cloud> set job_load_adjustments_decay_time = 0:0:0 [tools]
21:45 <YuviPanda> restarted gridengine master [tools]
21:43 <valhallasw`cloud> qstat -j <jobid> shows all queues overloaded; seems to have started just after a load test for the new maxujobs setting [tools]
21:42 <valhallasw`cloud> resetting to 0:7:30, as it's not having the intended effect [tools]
21:41 <valhallasw`cloud> currently 353 jobs in qw state [tools]
21:40 <valhallasw`cloud> that's load_adjustment_decay_time [tools]
21:40 <valhallasw`cloud> temporarily sudo qconf -msconf to 0:0:1 [tools]
19:59 <YuviPanda> Set maxujobs (max concurrent jobs per user) on gridengine to 128 [tools]
17:51 <YuviPanda> kill all queries running on labsdb1003 [tools]
17:20 <YuviPanda> stopped webservice for quentinv57-tools [tools]
2016-01-09 §
21:07 <valhallasw`cloud> moved tools-checker/208.80.155.229 back to tools-checker-01 [tools]
21:02 <andrewbogott> rebooting tools-checker-01 as it is unresponsive. [tools]
13:12 <valhallasw`cloud> tools-worker-1002. is unresponsive. Maybe that's where the other grrrit-wm is hiding? Rebooting. [tools]
2016-01-08 §
19:46 <chasemp> couldn't get into tools-mail-01 at all and it seemed borked so I rebooted [tools]
17:23 <andrewbogott> killing tools.icelab as per https://wikitech.wikimedia.org/wiki/User_talk:Torin#Running_queries_on_tools-dev_.28tools-bastion-02.29 [tools]
2015-12-30 §
04:06 <YuviPanda> delete all webgrid jobs to start with a clean slate [tools]
03:54 <YuviPanda> qmod -rj all tools in the continuous queue, they are all orphaned [tools]
03:22 <YuviPanda> stop cron on tools-submit, wait for webservices to come back up [tools]
02:39 <YuviPanda> remove lbenedix and ebekebe from tools.hcclab [tools]