1501-1550 of 1782 results (20ms)
2016-02-18 §
22:57 <valhallasw`cloud> restarted gridengine-master on tools-grid-master, otherwise all webservices will stay down [tools]
20:37 <yuvipanda> failover proxy back to tools-proxy-01 [tools]
19:46 <chasemp> repool labvirt1003 and depool labvirt1004 [tools]
18:19 <chasemp> draining nodes from labvirt1001 [tools]
2016-02-16 §
21:33 <chasemp> reboot of bastion-1002 [tools]
2016-02-12 §
19:56 <chasemp> nfs traffic shaping pilot round 2 [tools]
2016-02-05 §
22:01 <chasemp> throttle some vm nfs write speeds [tools]
2016-02-03 §
03:00 <YuviPanda> upgraded flannel on all hosts running it [tools]
2016-01-29 §
21:25 <YuviPanda> restarted image-resize-calc manually, no service.manifest file [tools]
2016-01-27 §
23:07 <YuviPanda> removed all members of templatetiger, added self instead, removed active shell sessions [tools]
20:24 <chasemp> master stop, truncate accounting log to accounting.01272016, master start [tools]
19:34 <chasemp> master start grid master [tools]
19:23 <chasemp> stopped master [tools]
19:11 <YuviPanda> depooled tools-webgrid-1405 to prep for restart, lots of stuck processes [tools]
18:29 <valhallasw`cloud> job 2551539 is ifttt, which is also running as 2700629. Killing 2551539 . [tools]
18:26 <valhallasw`cloud> messages repeatedly reports "01/27/2016 18:26:17|worker|tools-grid-master|E|execd@tools-webgrid-generic-1405.tools.eqiad.wmflabs reports running job (2551539.1/master) in queue "webgrid-generic@tools-webgrid-generic-1405.tools.eqiad.wmflabs" that was not supposed to be there - killing". SSH'ing there to investigate [tools]
18:24 <valhallasw`cloud> 'sleep' test job also seems to work without issues [tools]
18:23 <valhallasw`cloud> no errors in log file, qstat works [tools]
18:23 <chasemp> master sge restarted post dump and restart for jobs db [tools]
18:22 <valhallasw`cloud> messages file reports 'Wed Jan 27 18:21:39 UTC 2016 db_load_sge_maint_pre_jobs_dump_01272016' [tools]
18:20 <chasemp> master db_load -f /root/sge_maint_pre_jobs_dump_01272016 sge_job [tools]
18:19 <valhallasw`cloud> dumped jobs database to /root/sge_maint_pre_jobs_dump_01272016, 4.6M [tools]
18:17 <valhallasw`cloud> SGE Configuration successfully saved to /root/sge_maint_01272016 directory. [tools]
18:14 <chasemp> grid master stopped [tools]
2016-01-26 §
21:28 <YuviPanda> qstat -u '*' | grep E | awk '{print $1}' | xargs -L1 qmod -cj [tools]
21:16 <chasemp> reboot tools-exec-1217.tools.eqiad.wmflabs [tools]
2016-01-25 §
20:30 <YuviPanda> switched over cron host to tools-cron-01, manually copied all old cron files from tools-submit to tools-cron-01 [tools]
19:06 <chasemp> kill python merge/merge-unique.py tools-exec-1213 as it seemed to be overwhelming nfs [tools]
2016-01-21 §
22:24 <YuviPanda> deleted tools-redis-01 and -02 (are on 1001 and 1002 now) [tools]
21:13 <YuviPanda> repooled exec nodes on labvirt1010 [tools]
21:08 <YuviPanda> gridengine-master started, verified shadow hasn't started [tools]
21:00 <YuviPanda> stop gridengine master [tools]
20:51 <YuviPanda> repooled exec nodes on labvirt1007 was last message [tools]
20:51 <YuviPanda> repooled exec nodes on labvirt1006 [tools]
20:39 <YuviPanda> failover tools-static too tools-web-static-01 [tools]
20:38 <YuviPanda> failover tools-checker to tools-checker-01 [tools]
20:32 <YuviPanda> depooled exec nodes on 1007 [tools]
20:32 <YuviPanda> repooled exec nodes on 1006 [tools]
20:14 <YuviPanda> depooled all exec nodes in labvirt1006 [tools]
20:11 <YuviPanda> repooled exec node son 1005 [tools]
19:53 <YuviPanda> depooled exec nodes on labvirt1005 [tools]
19:49 <YuviPanda> repooled exec nodes from labvirt1004 [tools]
19:48 <YuviPanda> failed over proxy to tools-proxy-01 again [tools]
19:31 <YuviPanda> depooled exec nodes from labvirt1004 [tools]
19:29 <YuviPanda> repooled exec nodes from labvirt1003 [tools]
19:13 <YuviPanda> depooled instances on labvirt1003 [tools]
19:06 <YuviPanda> re-enabled queues on exec nodes that were on labvirt1002 [tools]
19:02 <YuviPanda> failed over tools proxy to tools-proxy-02 [tools]
18:46 <YuviPanda> drained and disabled queues on all nodes on labvirt1002 [tools]
18:38 <YuviPanda> restarted all restartable jobs in instances on labvirt1001 and deleted all non-restartable ghost jobs. these were already dead [tools]