2015-04-03
§
|
10:37 |
<hashar> |
disabled some hiera configuration related to puppetmaster. |
[releng] |
10:22 |
<hashar> |
Created instance i-00000a4a with image "ubuntu-12.04-precise" and hostname i-00000a4a.eqiad.wmflabs. |
[releng] |
10:21 |
<hashar> |
downgrading integration-puppetmaster from Trusty to Precise https://phabricator.wikimedia.org/T94927 |
[releng] |
05:42 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/200744 |
[releng] |
03:58 |
<Krinkle> |
Jobs were throwing NOT_RECOGNISED. Relaunched Gearman. Jobs are now happy again. |
[releng] |
03:51 |
<Krinkle> |
Jenkins is unable to re-establish Gearman connection. Have to force restart Jenkins master. |
[releng] |
03:44 |
<greg-g> |
*unable |
[releng] |
03:44 |
<Krinkle> |
References to past hour of builds have been restored. But Jenkins is still enable to make new references properly. New builds are 404'ing the same way. |
[releng] |
03:42 |
<Krinkle> |
Reloading Jenking config repaired the broken references. Build urls are now resolving again. |
[releng] |
03:26 |
<Krinkle> |
Reloading Jenkins configuration from disk to mitigate |
[releng] |
03:18 |
<Krinkle> |
The failure started at 03:03 exactly. The newer build metadata exists at /var/lib/jenkins/jobs/:jobname/builds/:nr, but the jobs/*/last*Build symlinks are no longer updated. |
[releng] |
02:47 |
<Krinkle> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/201644 |
[releng] |
00:31 |
<greg-g> |
rm 'd .gitignore in /srv/mediawiki-staging/php-master/skins due to https://gerrit.wikimedia.org/r/#/c/200307/ clashing with a local untracked version |
[releng] |
2015-04-02
§
|
22:56 |
<Krinkle> |
New integration-slave-precise-101x are unfinished and must remain depooled. See T94916. |
[releng] |
22:53 |
<Krinkle> |
Most puppet failures blocking T94916 may be caused by the fact that intergration-puppetmaster was inadvertently changed to Trusty; puppetmaster version of Trusty is not yet supported by ops |
[releng] |
21:41 |
<Krinkle> |
It seems integration-slave-jessie-1001 has role::ci::slave::labs::common instead of role::ci::slave::labs. Intentional? |
[releng] |
21:25 |
<Krinkle> |
Re-creating integration-dev-slave-precise in preparation of re-creating precise slaves |
[releng] |
14:51 |
<hashar> |
applying role::ci::slave::labs::common on integration-slave-jessie-1001 |
[releng] |
14:49 |
<hashar> |
integration: nice thing, newly created instances are automatically made to point to integration-pummetmaster via hiera! Just have to sign the certificate on the master using: puppet ca list ; puppet ca sign i-000xxxx.eqiad.wmflabs |
[releng] |
14:42 |
<hashar> |
Created [[Nova_Resource:I-00000a3b.eqiad.wmflabs|integration-slave-jessie-1001]] to try out CI slave on Jessie ([[T94836]]) |
[releng] |
14:11 |
<hashar> |
reduced integration-slave1004 executors from 6 to 5 to make it on par with the other precise slaves |
[releng] |
14:10 |
<hashar> |
integration-slave100[1-4] are now using Zuul provided by a Debian package as of https://gerrit.wikimedia.org/r/#/c/195272/ PS 16 |
[releng] |
14:04 |
<hashar> |
uninstall the pip installed zuul version from Precise labs slaves by doing: pip uninstall zuul && rm /usr/local/bin/zuul* . Switching them all to a Debian package |
[releng] |
13:45 |
<hashar> |
pooling back integration-slave1001 and 1002 which are using zuul-cloner provided by a debian package |
[releng] |
13:35 |
<hashar> |
reloading Jenkins configuration files from disk to make it knows about a change manually applied to most jobs config.xml files for https://gerrit.wikimedia.org/r/#/c/201451/ |
[releng] |
13:01 |
<Krinkle> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/201458 |
[releng] |
12:19 |
<hashar> |
preventing job to run on integration-slave1001 by replacing its label with 'DoNotLabelThisSlaveHashar'. Going to install Zuul debian package on it |
[releng] |
09:37 |
<hashar> |
rebooting integration-zuul-server homedir seems to be stalled/missing |
[releng] |
08:12 |
<hashar> |
upgrading packages on integration-dev |
[releng] |
05:14 |
<greg-g> |
and right when I log'd that, things seem to be recovering |
[releng] |
05:12 |
<greg-g> |
the shinken alerts about beta cluster issues are due to wmflabs having issues. |
[releng] |
2015-04-01
§
|
07:17 |
<Krinkle> |
Creating integration-slave1410 as test. Will re-create our pool later today. |
[releng] |
06:26 |
<Krinkle> |
Apply puppetmaster::autosigner to integration-puppetmaster |
[releng] |
05:51 |
<legoktm> |
deleting non-existent job workspaces from integration slaves |
[releng] |
05:42 |
<Krinkle> |
Free up space on integration-slave1001-1004 by removing obsolete phplint and qunit workspaces |
[releng] |
02:05 |
<Krinkle> |
Restarting Jenkins again.. |
[releng] |
01:35 |
<legoktm> |
started zuul on gallium |
[releng] |
01:00 |
<Krinkle> |
Restarting Jenkins |
[releng] |
01:00 |
<Krinkle> |
Jenkins is unable to start Gearman connection (HTTP 503); |
[releng] |
01:00 |
<Krinkle> |
Force restarted Zuul, didn't help |
[releng] |
00:55 |
<Krinkle> |
Jenkins stuck. Builds are queued in Zuul but nothing is sent to Jenkins. |
[releng] |