2015-04-09
§
|
12:31 |
<hashar> |
beta: reset hard of operations/puppet repo on the puppetmaster since it has been stalled for 9+days https://phabricator.wikimedia.org/T95539 |
[releng] |
10:46 |
<hashar> |
repacked extensions in deployment-bastion staging area: <tt>find /mnt/srv/mediawiki-staging/php-master/extensions -maxdepth 2 -type f -name .git -exec bash -c 'cd `dirname {}` && pwd && git repack -Ad && git gc' \\;</tt> |
[releng] |
10:31 |
<hashar> |
deployment-bastion has a lock file remaining /mnt/srv/mediawiki-staging/php-master/extensions/.git/refs/remotes/origin/master.lock |
[releng] |
09:55 |
<hashar> |
restarted Zuul to clear out some stalled jobs |
[releng] |
09:35 |
<Krinkle> |
Pooled integration-slave-trusty-1010 |
[releng] |
08:59 |
<hashar> |
rebooted deployment-bastion and cleared some files under /var/ |
[releng] |
08:51 |
<hashar> |
deployment-bastion is out of disk space on /var/ :( |
[releng] |
08:50 |
<hashar> |
https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ timed out after 30 minutes while trying to git pull |
[releng] |
08:50 |
<hashar> |
https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/ job stalled for some reason |
[releng] |
06:15 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202998 |
[releng] |
06:02 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202992 |
[releng] |
05:11 |
<legoktm> |
deleted core dumps from integration-slave1002, /var had filled up |
[releng] |
04:36 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202938 |
[releng] |
00:32 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202279 |
[releng] |
2015-04-08
§
|
21:56 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202930 |
[releng] |
21:15 |
<legoktm> |
deleting non-existent jobs' workspaces on labs slaves |
[releng] |
19:09 |
<Krinkle> |
Re-establishing Gearman-Jenkins connection |
[releng] |
19:00 |
<Krinkle> |
Restarting Jenkins |
[releng] |
19:00 |
<Krinkle> |
Jenkins Master unable to re-establish Gearman connection |
[releng] |
19:00 |
<Krinkle> |
Zuul queue is not being distributed properly. Many slaves are idling waiting to receive builds but not getting any. |
[releng] |
18:29 |
<Krinkle> |
Another attempt at re-creating the Trusty slave pool (T94916) |
[releng] |
18:07 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/202289 and https://gerrit.wikimedia.org/r/202445 |
[releng] |
18:01 |
<Krinkle> |
Jobs for Precise slaves are not starting. Stuck in Zuul as 'queued'. Disconnected and restarted slave agent on them. Queue is back up now. |
[releng] |
17:36 |
<legoktm> |
deployed https://gerrit.wikimedia.org/r/180418 |
[releng] |
13:32 |
<hashar> |
Disabled Zuul install based on git clone / setup.py by cherry picking https://gerrit.wikimedia.org/r/#/c/202714/ . Installed the Zuul debian package on all slaves |
[releng] |
13:31 |
<hashar> |
integration: running <tt>apt-get upgrade</tt> on Trusty slaves |
[releng] |
13:30 |
<hashar> |
integration: upgrading python-gear and python-six on Trusty slaves |
[releng] |
12:43 |
<hasharLunch> |
Zuul is back and it is nasty |
[releng] |
12:24 |
<hasharLunch> |
killed zuul on gallium :/ |
[releng] |
2015-04-07
§
|
16:26 |
<Krinkle> |
git-deploy: Deploying integration/slave-scripts 4c6f541 |
[releng] |
12:57 |
<hashar> |
running apt-get upgrade on integration-slave-trusty* hosts |
[releng] |
12:45 |
<hashar> |
recreating integration-slave-trusty-1005 |
[releng] |
12:26 |
<hashar> |
deleting integration-slave-trusty-1005 has been provisioned with role::ci::website instead of role::ci::slave::labs |
[releng] |
12:11 |
<hashar> |
retriggering a bunch of browser tests hitting beta.wmflabs.org |
[releng] |
12:07 |
<hashar> |
Puppet being fixed, it is finishing the installation of integration-slave-trusty-*** hosts |
[releng] |
12:03 |
<hashar> |
Browser tests against beta cluster were all failing due to an improper DNS resolver being applied on CI labs instances {{bug|T95273}}. Should be fixed now. |
[releng] |
12:00 |
<hashar> |
running puppet on all integration machines and resigning puppet client certs |
[releng] |
11:31 |
<hashar> |
integration-puppetmaster is back and operational with local puppet client working properly. |
[releng] |
11:28 |
<hashar> |
restored /etc/puppet/fileserver.conf |
[releng] |
11:08 |
<hashar> |
dishing out puppet SSL configuration on all integratio nodes. Can't figure out so lets restart from scratch |
[releng] |
10:52 |
<hashar> |
made puppetmaster certname = integration-puppetmaster.eqiad.wmflabs instead of the ec2 id :( |
[releng] |
10:49 |
<hashar> |
manually hacking integration-puppetmaster /etc/puppet/puppet.conf config file which is missing the [master] section |
[releng] |
01:25 |
<Krinkle> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/202300 |
[releng] |