2016-02-12
§
|
23:54 |
<hashar> |
beta cluster broken since 20:30 UTC https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor havent looked |
[releng] |
17:36 |
<hashar> |
salt -v '*slave-trusty*' cmd.run 'apt-get -y install texlive-generic-extra' # T126422 |
[releng] |
17:32 |
<hashar> |
adding texlive-generic-extra on CI slaves by cherry picking https://gerrit.wikimedia.org/r/#/c/270322/ - T126422 |
[releng] |
17:19 |
<hashar> |
get rid of integration-dev it is broken somehow |
[releng] |
17:10 |
<hashar> |
Nodepool back at spawning instances. contintcloud has been migrated in wmflabs |
[releng] |
16:51 |
<thcipriani> |
running sudo salt '*' -b '10%' deploy.fixurl to fix deployment-prep trebuchet urls |
[releng] |
16:31 |
<hashar> |
bd808 added support for saltbot to update tasks automagically!!!! T108720 |
[releng] |
16:15 |
<hashar> |
the pool of CI slaves is exhausted, no more jobs running (scheduled labs maintenance) |
[releng] |
03:10 |
<yurik> |
attempted to sync graphoid from gerrit 270166 from deployment-tin, but it wouldn't sync. Tried to git pull sca02, submodules wouldn't pull |
[releng] |
2016-02-11
§
|
22:53 |
<thcipriani> |
shutting down deployment-bastion |
[releng] |
21:28 |
<hashar> |
pooling back slaves 1001 to 1006 |
[releng] |
21:18 |
<hashar> |
re enabling hhvm service on slaves ( https://phabricator.wikimedia.org/T126594 ) Some symlink is missing and only provided by the upstart script grrrrrrr https://phabricator.wikimedia.org/T126658 |
[releng] |
20:52 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/270098 |
[releng] |
20:35 |
<hashar> |
depooling the six recent slaves: /usr/lib/x86_64-linux-gnu/hhvm/extensions/current/luasandbox.so cannot open shared object file |
[releng] |
20:29 |
<hashar> |
pooling integration-slave-trusty-1004 integration-slave-trusty-1005 integration-slave-trusty-1006 |
[releng] |
20:14 |
<hashar> |
pooling integration-slave-trusty-1001 integration-slave-trusty-1002 integration-slave-trusty-1003 |
[releng] |
19:35 |
<marxarelli> |
modifying deployment server node in jenkins to point to deployment-tin |
[releng] |
19:27 |
<thcipriani> |
running sudo salt -b '10%' '*' cmd.run 'puppet agent -t' from deployment-salt |
[releng] |
19:27 |
<twentyafterfour> |
Keeping notes on the ticket: https://phabricator.wikimedia.org/T126537 |
[releng] |
19:24 |
<thcipriani> |
moving deployment-bastion to deployment-tin |
[releng] |
17:59 |
<hashar> |
recreated instances with proper names: integration-slave-trusty-{1001-1006} |
[releng] |
17:52 |
<hashar> |
Created integration-slave-trusty-{1019-1026} as m1.large (note 1023 is an exception it is for Android). Applied role::ci::slave , lets wait for puppet to finish |
[releng] |
17:42 |
<Krinkle> |
Currently testing https://gerrit.wikimedia.org/r/#/c/268802/ in Beta Labs |
[releng] |
17:27 |
<hashar> |
Depooling all the ci.medium slaves and deleting them. |
[releng] |
17:27 |
<hashar> |
I tried. The ci.medium instances are too small and MediaWiki tests really need 1.5GBytes of memory :-( |
[releng] |
16:00 |
<hashar> |
rebuilding integration-dev https://phabricator.wikimedia.org/T126613 |
[releng] |
15:27 |
<Krinkle> |
Deploy Zuul config change https://gerrit.wikimedia.org/r/269976 |
[releng] |
11:46 |
<hashar> |
salt -v '*' cmd.run '/etc/init.d/apache2 restart' might help for Wikidata browser tests failling |
[releng] |
11:31 |
<hashar> |
disabling hhvm service on CI slaves ( https://phabricator.wikimedia.org/T126594 , cherry picked both patches ) |
[releng] |
10:50 |
<hashar> |
reenabled puppet on CI. All transitioned to a 128MB tmpfs (was 512MB) |
[releng] |
10:16 |
<hashar> |
pooling back integration-slave-trusty-1009 and integration-slave-trusty-1010 (tmpfs shrunken) |
[releng] |
10:06 |
<hashar> |
disabling puppet on all CI slaves. Trying to lower tmpfs 512MB to 128MB ( https://gerrit.wikimedia.org/r/#/c/269880/ ) |
[releng] |
02:45 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/269853 https://gerrit.wikimedia.org/r/269893 |
[releng] |
2016-02-10
§
|
23:54 |
<hashar_> |
depooling Trusty slaves that only have 2GB of ram that is not enough. https://phabricator.wikimedia.org/T126545 |
[releng] |
22:55 |
<hashar_> |
gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete ( https://phabricator.wikimedia.org/T126552 ) |
[releng] |
22:34 |
<Krinkle> |
Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs |
[releng] |
22:31 |
<Krinkle> |
Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either. |
[releng] |
21:22 |
<legoktm> |
cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again |
[releng] |
21:16 |
<hashar> |
CI dust have settled. Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty) |
[releng] |
21:08 |
<hashar> |
pooling trusty slaves 1009, 1010, 1021, 1022 with 2 executors (they are ci.medium) |
[releng] |
20:38 |
<hashar> |
cancelling mediawiki-core-jsduck-publish and mediawiki-core-doxygen-publish jobs manually. They will catch up on next merge |
[releng] |
20:34 |
<Krinkle> |
Pooled integration-slave-trusty-1019 (new) |
[releng] |
20:28 |
<Krinkle> |
Pooled integration-slave-trusty-1020 (new) |
[releng] |
20:24 |
<Krinkle> |
created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium) |
[releng] |
20:18 |
<hashar> |
created integration-slave-trusty-1009 and 1010 (trusty ci.medium) |
[releng] |