2016-01-11
§
|
22:19 |
<valhallasw`cloud> |
reset maxujobs 0->128, job_load_adjustments none->np_load_avg=0.50, load_ad... -> 0:7:30 |
[tools] |
22:12 |
<YuviPanda> |
restarted gridengine master again |
[tools] |
22:07 |
<valhallasw`cloud> |
set job_load_adjustments from np_load_avg=0.50 to none and load_adjustment_decay_time to 0:0:0 |
[tools] |
22:05 |
<valhallasw`cloud> |
set maxujobs back to 0, but doesn't help |
[tools] |
21:57 |
<valhallasw`cloud> |
reset to 7:30 |
[tools] |
21:57 |
<valhallasw`cloud> |
that cleared the measure, but jobs still not starting. Ugh! |
[tools] |
21:55 |
<valhallasw`cloud> |
set job_load_adjustments_decay_time = 0:0:0 |
[tools] |
21:45 |
<YuviPanda> |
restarted gridengine master |
[tools] |
21:43 |
<valhallasw`cloud> |
qstat -j <jobid> shows all queues overloaded; seems to have started just after a load test for the new maxujobs setting |
[tools] |
21:42 |
<valhallasw`cloud> |
resetting to 0:7:30, as it's not having the intended effect |
[tools] |
21:41 |
<valhallasw`cloud> |
currently 353 jobs in qw state |
[tools] |
21:40 |
<valhallasw`cloud> |
that's load_adjustment_decay_time |
[tools] |
21:40 |
<valhallasw`cloud> |
temporarily sudo qconf -msconf to 0:0:1 |
[tools] |
19:59 |
<YuviPanda> |
Set maxujobs (max concurrent jobs per user) on gridengine to 128 |
[tools] |
17:51 |
<YuviPanda> |
kill all queries running on labsdb1003 |
[tools] |
17:20 |
<YuviPanda> |
stopped webservice for quentinv57-tools |
[tools] |