5951-6000 of 10000 results (25ms)
2016-02-13 §
05:25 <bd808> jobrunner process on deployment-jobrunner01 badly broken; investigating [releng]
05:20 <bd808> Ran https://phabricator.wikimedia.org/P2273 on deployment-jobrunner01.deployment-prep.eqiad.wmflabs; freed ~500M; disk utilization still at 94% [releng]
2016-02-12 §
23:54 <hashar> beta cluster broken since 20:30 UTC https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor havent looked [releng]
17:36 <hashar> salt -v '*slave-trusty*' cmd.run 'apt-get -y install texlive-generic-extra' # T126422 [releng]
17:32 <hashar> adding texlive-generic-extra on CI slaves by cherry picking https://gerrit.wikimedia.org/r/#/c/270322/ - T126422 [releng]
17:19 <hashar> get rid of integration-dev it is broken somehow [releng]
17:10 <hashar> Nodepool back at spawning instances. contintcloud has been migrated in wmflabs [releng]
16:51 <thcipriani> running sudo salt '*' -b '10%' deploy.fixurl to fix deployment-prep trebuchet urls [releng]
16:31 <hashar> bd808 added support for saltbot to update tasks automagically!!!! T108720 [releng]
16:15 <hashar> the pool of CI slaves is exhausted, no more jobs running (scheduled labs maintenance) [releng]
03:10 <yurik> attempted to sync graphoid from gerrit 270166 from deployment-tin, but it wouldn't sync. Tried to git pull sca02, submodules wouldn't pull [releng]
2016-02-11 §
22:53 <thcipriani> shutting down deployment-bastion [releng]
21:28 <hashar> pooling back slaves 1001 to 1006 [releng]
21:18 <hashar> re enabling hhvm service on slaves ( https://phabricator.wikimedia.org/T126594 ) Some symlink is missing and only provided by the upstart script grrrrrrr https://phabricator.wikimedia.org/T126658 [releng]
20:52 <legoktm> deploying https://gerrit.wikimedia.org/r/270098 [releng]
20:35 <hashar> depooling the six recent slaves: /usr/lib/x86_64-linux-gnu/hhvm/extensions/current/luasandbox.so cannot open shared object file [releng]
20:29 <hashar> pooling integration-slave-trusty-1004 integration-slave-trusty-1005 integration-slave-trusty-1006 [releng]
20:14 <hashar> pooling integration-slave-trusty-1001 integration-slave-trusty-1002 integration-slave-trusty-1003 [releng]
19:35 <marxarelli> modifying deployment server node in jenkins to point to deployment-tin [releng]
19:27 <thcipriani> running sudo salt -b '10%' '*' cmd.run 'puppet agent -t' from deployment-salt [releng]
19:27 <twentyafterfour> Keeping notes on the ticket: https://phabricator.wikimedia.org/T126537 [releng]
19:24 <thcipriani> moving deployment-bastion to deployment-tin [releng]
17:59 <hashar> recreated instances with proper names: integration-slave-trusty-{1001-1006} [releng]
17:52 <hashar> Created integration-slave-trusty-{1019-1026} as m1.large (note 1023 is an exception it is for Android). Applied role::ci::slave , lets wait for puppet to finish [releng]
17:42 <Krinkle> Currently testing https://gerrit.wikimedia.org/r/#/c/268802/ in Beta Labs [releng]
17:27 <hashar> Depooling all the ci.medium slaves and deleting them. [releng]
17:27 <hashar> I tried. The ci.medium instances are too small and MediaWiki tests really need 1.5GBytes of memory :-( [releng]
16:00 <hashar> rebuilding integration-dev https://phabricator.wikimedia.org/T126613 [releng]
15:27 <Krinkle> Deploy Zuul config change https://gerrit.wikimedia.org/r/269976 [releng]
11:46 <hashar> salt -v '*' cmd.run '/etc/init.d/apache2 restart' might help for Wikidata browser tests failling [releng]
11:31 <hashar> disabling hhvm service on CI slaves ( https://phabricator.wikimedia.org/T126594 , cherry picked both patches ) [releng]
10:50 <hashar> reenabled puppet on CI. All transitioned to a 128MB tmpfs (was 512MB) [releng]
10:16 <hashar> pooling back integration-slave-trusty-1009 and integration-slave-trusty-1010 (tmpfs shrunken) [releng]
10:06 <hashar> disabling puppet on all CI slaves. Trying to lower tmpfs 512MB to 128MB ( https://gerrit.wikimedia.org/r/#/c/269880/ ) [releng]
02:45 <legoktm> deploying https://gerrit.wikimedia.org/r/269853 https://gerrit.wikimedia.org/r/269893 [releng]
2016-02-10 §
23:54 <hashar_> depooling Trusty slaves that only have 2GB of ram that is not enough. https://phabricator.wikimedia.org/T126545 [releng]
22:55 <hashar_> gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete ( https://phabricator.wikimedia.org/T126552 ) [releng]
22:34 <Krinkle> Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs [releng]
22:31 <Krinkle> Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either. [releng]
21:22 <legoktm> cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again [releng]
21:16 <hashar> CI dust have settled. Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty) [releng]
21:08 <hashar> pooling trusty slaves 1009, 1010, 1021, 1022 with 2 executors (they are ci.medium) [releng]
20:38 <hashar> cancelling mediawiki-core-jsduck-publish and mediawiki-core-doxygen-publish jobs manually. They will catch up on next merge [releng]
20:34 <Krinkle> Pooled integration-slave-trusty-1019 (new) [releng]
20:28 <Krinkle> Pooled integration-slave-trusty-1020 (new) [releng]
20:24 <Krinkle> created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium) [releng]
20:18 <hashar> created integration-slave-trusty-1009 and 1010 (trusty ci.medium) [releng]
20:06 <hashar> creating integration-slave-trusty-1021 and integration-slave-trusty-1022 (ci.medium) [releng]
19:48 <greg-g> that cleanup was done by apergos [releng]
19:48 <greg-g> did cleanup across all integration slaves, some were very close to out of room. results: https://phabricator.wikimedia.org/P2587 [releng]