| 2016-02-12
      
      § | 
    
  | 23:54 | <hashar> | beta cluster broken since 20:30 UTC   https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor   havent looked | [releng] | 
            
  | 17:36 | <hashar> | salt -v '*slave-trusty*' cmd.run 'apt-get -y install texlive-generic-extra'     # T126422 | [releng] | 
            
  | 17:32 | <hashar> | adding texlive-generic-extra on CI slaves by cherry picking https://gerrit.wikimedia.org/r/#/c/270322/ - T126422 | [releng] | 
            
  | 17:19 | <hashar> | get rid of integration-dev	   it is broken somehow | [releng] | 
            
  | 17:10 | <hashar> | Nodepool back at spawning instances.  contintcloud has been migrated in wmflabs | [releng] | 
            
  | 16:51 | <thcipriani> | running  sudo salt '*' -b '10%' deploy.fixurl to fix deployment-prep trebuchet urls | [releng] | 
            
  | 16:31 | <hashar> | bd808 added support for saltbot to update tasks automagically!!!! T108720 | [releng] | 
            
  | 16:15 | <hashar> | the pool of CI slaves is exhausted, no more jobs running (scheduled labs maintenance) | [releng] | 
            
  | 03:10 | <yurik> | attempted to sync graphoid from gerrit 270166 from deployment-tin, but it wouldn't sync.  Tried to git pull sca02, submodules wouldn't pull | [releng] | 
            
  
    | 2016-02-11
      
      § | 
    
  | 22:53 | <thcipriani> | shutting down deployment-bastion | [releng] | 
            
  | 21:28 | <hashar> | pooling back slaves 1001 to 1006 | [releng] | 
            
  | 21:18 | <hashar> | re enabling hhvm service on slaves ( https://phabricator.wikimedia.org/T126594 ) Some symlink is missing and only provided by the upstart script grrrrrrr https://phabricator.wikimedia.org/T126658 | [releng] | 
            
  | 20:52 | <legoktm> | deploying https://gerrit.wikimedia.org/r/270098 | [releng] | 
            
  | 20:35 | <hashar> | depooling the six recent slaves: /usr/lib/x86_64-linux-gnu/hhvm/extensions/current/luasandbox.so cannot open shared object file | [releng] | 
            
  | 20:29 | <hashar> | pooling integration-slave-trusty-1004 integration-slave-trusty-1005 integration-slave-trusty-1006 | [releng] | 
            
  | 20:14 | <hashar> | pooling integration-slave-trusty-1001 integration-slave-trusty-1002 integration-slave-trusty-1003 | [releng] | 
            
  | 19:35 | <marxarelli> | modifying deployment server node in jenkins to point to deployment-tin | [releng] | 
            
  | 19:27 | <thcipriani> | running sudo salt -b '10%' '*' cmd.run 'puppet agent -t' from deployment-salt | [releng] | 
            
  | 19:27 | <twentyafterfour> | Keeping notes on the ticket: https://phabricator.wikimedia.org/T126537 | [releng] | 
            
  | 19:24 | <thcipriani> | moving deployment-bastion to deployment-tin | [releng] | 
            
  | 17:59 | <hashar> | recreated instances with proper names:  integration-slave-trusty-{1001-1006} | [releng] | 
            
  | 17:52 | <hashar> | Created integration-slave-trusty-{1019-1026} as m1.large  (note 1023 is an exception it is for Android).  Applied role::ci::slave , lets wait for puppet to finish | [releng] | 
            
  | 17:42 | <Krinkle> | Currently testing https://gerrit.wikimedia.org/r/#/c/268802/ in Beta Labs | [releng] | 
            
  | 17:27 | <hashar> | Depooling all the ci.medium slaves and deleting them. | [releng] | 
            
  | 17:27 | <hashar> | I tried.  The ci.medium instances are too small and MediaWiki tests really need 1.5GBytes of memory :-( | [releng] | 
            
  | 16:00 | <hashar> | rebuilding integration-dev https://phabricator.wikimedia.org/T126613 | [releng] | 
            
  | 15:27 | <Krinkle> | Deploy Zuul config change https://gerrit.wikimedia.org/r/269976 | [releng] | 
            
  | 11:46 | <hashar> | salt -v '*' cmd.run '/etc/init.d/apache2 restart'   might help for Wikidata browser tests failling | [releng] | 
            
  | 11:31 | <hashar> | disabling hhvm service on CI slaves ( https://phabricator.wikimedia.org/T126594 , cherry picked both patches ) | [releng] | 
            
  | 10:50 | <hashar> | reenabled puppet on CI. All transitioned to a 128MB tmpfs (was 512MB) | [releng] | 
            
  | 10:16 | <hashar> | pooling back integration-slave-trusty-1009 and integration-slave-trusty-1010  (tmpfs shrunken) | [releng] | 
            
  | 10:06 | <hashar> | disabling puppet on all CI slaves. Trying to lower tmpfs 512MB to 128MB  ( https://gerrit.wikimedia.org/r/#/c/269880/ ) | [releng] | 
            
  | 02:45 | <legoktm> | deploying https://gerrit.wikimedia.org/r/269853 https://gerrit.wikimedia.org/r/269893 | [releng] | 
            
  
    | 2016-02-10
      
      § | 
    
  | 23:54 | <hashar_> | depooling Trusty slaves that only have 2GB of ram that is not enough.  https://phabricator.wikimedia.org/T126545 | [releng] | 
            
  | 22:55 | <hashar_> | gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete  (  https://phabricator.wikimedia.org/T126552 ) | [releng] | 
            
  | 22:34 | <Krinkle> | Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs | [releng] | 
            
  | 22:31 | <Krinkle> | Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either. | [releng] | 
            
  | 21:22 | <legoktm> | cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again | [releng] | 
            
  | 21:16 | <hashar> | CI dust have settled.  Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty) | [releng] | 
            
  | 21:08 | <hashar> | pooling trusty slaves 1009, 1010, 1021, 1022  with 2 executors  (they are ci.medium) | [releng] | 
            
  | 20:38 | <hashar> | cancelling mediawiki-core-jsduck-publish  and mediawiki-core-doxygen-publish jobs manually.  They will catch up on next merge | [releng] | 
            
  | 20:34 | <Krinkle> | Pooled integration-slave-trusty-1019 (new) | [releng] | 
            
  | 20:28 | <Krinkle> | Pooled integration-slave-trusty-1020 (new) | [releng] | 
            
  | 20:24 | <Krinkle> | created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium) | [releng] | 
            
  | 20:18 | <hashar> | created integration-slave-trusty-1009 and 1010 (trusty ci.medium) | [releng] | 
            
  | 20:06 | <hashar> | creating integration-slave-trusty-1021 and integration-slave-trusty-1022 (ci.medium) | [releng] | 
            
  | 19:48 | <greg-g> | that cleanup was done by apergos | [releng] | 
            
  | 19:48 | <greg-g> | did cleanup across all integration slaves, some were very close to out of room. results:  https://phabricator.wikimedia.org/P2587 | [releng] | 
            
  | 19:43 | <hashar> | Dropping slaves Precise m1.large  integration-slave-precise-1014 and  integration-slave-precise-1013 , most load shifted to Trusty (php53 -> php55 transition) | [releng] | 
            
  | 18:20 | <Krinkle> | Creating a Trusty slave to support increased demand following MediaWIki php53(precise)>php55(trusty) bump | [releng] |