| 
      
        2015-08-18
      
      §
     | 
  
    
  | 13:47 | 
  <valhallasw`cloud> | 
  that brought tools-exec-1403, tools-exec-1406 and tools-webgrid-generic-1402 back up, tools-exec-1401 and tools-exec-catscan are still in 'au' state | 
  [tools] | 
            
  | 13:46 | 
  <valhallasw`cloud> | 
  starting gridengine-exec on hosts with queues in 'au' (=alarm, unknown) state using <code>for i in $(qstat -f -xml | grep "<state>au" -B 6 | grep "<name>" | cut -d'@' -f2 | cut -d. -f1); do echo $i; ssh $i sudo service gridengine-exec start; done</code> | 
  [tools] | 
            
  | 08:37 | 
  <valhallasw`cloud> | 
  sudo service gridengine-exec start on tools-webgrid-lighttpd-1404.eqiad.wmflabs" tools-webgrid-lighttpd-1406.eqiad.wmflabs" tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" | 
  [tools] | 
            
  | 08:33 | 
  <valhallasw`cloud> | 
  tools-webgrid-lighttpd-1403.eqiad.wmflabs, tools-webgrid-lighttpd-1404.eqiad.wmflabs and tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs are all broken (queue dropped because it is temporarily not available) | 
  [tools] | 
            
  | 08:30 | 
  <valhallasw`cloud> | 
  hostname mismatch: host is called tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs in config, but it was named tools-webgrid-lighttpd-1411.eqiad.wmflabs in the hostgroup config | 
  [tools] | 
            
  | 08:21 | 
  <valhallasw`cloud> | 
  still sudo qmod -e "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" -> invalid queue "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" | 
  [tools] | 
            
  | 08:20 | 
  <valhallasw`cloud> | 
  sudo qconf -mhgrp "@webgrid", added tools-webgrid-lighttpd-1411.eqiad.wmflabs | 
  [tools] | 
            
  | 08:14 | 
  <valhallasw`cloud> | 
  and the hostgroup @webgrid doesn't even exist? (╯°□°)╯︵ ┻━┻ | 
  [tools] | 
            
  | 08:10 | 
  <valhallasw`cloud> | 
  /var/lib/gridengine/etc/queues/webgrid-lighttpd does not seem to be the correct configuration as the current config refers to '@webgrid' as host list.  | 
  [tools] | 
            
  | 08:07 | 
  <valhallasw`cloud> | 
  sudo qconf -Ae /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs -> root@tools-bastion-01.eqiad.wmflabs added "tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" to exechost list | 
  [tools] | 
            
  | 08:06 | 
  <valhallasw`cloud> | 
  ok, success. /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs now exists. Do I still have to add it manually to the grid? I suppose so. | 
  [tools] | 
            
  | 08:04 | 
  <valhallasw`cloud> | 
  installing packages from /data/project/.system/deb-trusty seems to fail. sudo apt-get update helps. | 
  [tools] | 
            
  | 08:00 | 
  <valhallasw`cloud> | 
  running puppet agent -tv again | 
  [tools] | 
            
  | 07:55 | 
  <valhallasw`cloud> | 
  argh. Disabling  toollabs::node::web::generic again and enabling  toollabs::node::web::lighttpd  | 
  [tools] | 
            
  | 07:54 | 
  <valhallasw`cloud> | 
  various issues such as Error: /Stage[main]/Gridengine::Submit_host/File[/var/lib/gridengine/default/common/accounting]/ensure: change from absent to link failed: Could not set 'link' on ensure: No such file or directory - /var/lib/gridengine/default/common at 17:/etc/puppet/modules/gridengine/manifests/submit_host.pp; probably an ordering issue in | 
  [tools] | 
            
  | 07:53 | 
  <valhallasw`cloud> | 
  Setting up adminbot (1.7.8) ... chmod: cannot access '/usr/lib/adminbot/README': No such file or directory  --- ran sudo touch /usr/lib/adminbot/README | 
  [tools] | 
            
  | 07:37 | 
  <valhallasw`cloud> | 
  applying role::labs::tools::compute and toollabs::node::web::generic to \tools-webgrid-lighttpd-1411 | 
  [tools] | 
            
  | 07:31 | 
  <valhallasw`cloud> | 
  reading puppet suggests I should qconf -ah /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs but that file is missing? | 
  [tools] | 
            
  | 07:26 | 
  <valhallasw`cloud> | 
  andrewbogott built tools-webgrid-lighttpd-1411 yesterday but it's not actually added as exec host. Trying to figure out how to do that... | 
  [tools] |