2015-08-18
§
|
23:59 |
<krenair@tin> |
Synchronized wmf-config/wikitech.php: T59040 (duration: 00m 12s) |
[production] |
23:37 |
<mutante> |
added papaul (pt1979) to WMF LDAP group |
[production] |
23:08 |
<mattflaschen@tin> |
Synchronized php-1.26wmf18/extensions/Flow/: Sync Flow 1.26wmf18 for watchlist fix. (duration: 00m 14s) |
[production] |
23:01 |
<hoo@tin> |
Synchronized wmf-config/: Set $wgPropertySuggesterClassifyingPropertyIds for testwikidata (duration: 00m 14s) |
[production] |
22:28 |
<krenair@tin> |
Synchronized php-1.26wmf19/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWSaveDialog.js: https://gerrit.wikimedia.org/r/#/c/232385/ (duration: 00m 12s) |
[production] |
22:17 |
<ori@tin> |
Synchronized php-1.26wmf19/includes/OutputPage.php: 1a4f1df2fe (duration: 00m 12s) |
[production] |
19:26 |
<twentyafterfour@tin> |
rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf19 |
[production] |
19:21 |
<twentyafterfour@tin> |
Finished scap: testwiki to 1.26wmf19 (duration: 51m 01s) |
[production] |
18:33 |
<jzerebecki> |
(mysql wasn't started as puppet never got to that point) |
[releng] |
18:32 |
<jzerebecki> |
/etc/init.d/elasticsearch start was looping endlessly because /var/run/elasticsearch/ did not exist even though it is part of the debian package elasticsearch which was installed. fixed the issue on this instance by: integration-slave-precise-1013:~# apt-get install --reinstall elasticsearch |
[releng] |
18:30 |
<twentyafterfour@tin> |
Started scap: testwiki to 1.26wmf19 |
[production] |
18:11 |
<ori@tin> |
Synchronized php-1.26wmf18/includes/OutputPage.php: 6ee94ca47c: Load all CSS in the top queue (duration: 00m 13s) |
[production] |
18:07 |
<robh> |
sodium returned to normal, mailman window over. |
[production] |
17:38 |
<ori@tin> |
Synchronized php-1.26wmf18/includes: 91ae6a39df, 4cc9622214: Added wfTransactionalTimeLimit() method and applied it; Try to make POSTs as transactional as possible (duration: 00m 16s) |
[production] |
17:21 |
<robh> |
T108099 complete, mailman restarted for a few minutes while i prepare next task. |
[production] |
17:17 |
<robh> |
puppet disabled on sodium, no touch. |
[production] |
17:03 |
<robh> |
mailman maint window starts now, list delivery will remain sporadic until I finish. (It'll work off and on, no messages should be lost) |
[production] |
16:32 |
<jzerebecki> |
offlined integration-slave-precise-1013 : Fails to connect to mysl. /etc/init.d/mysql start fails. |
[releng] |
16:00 |
<jzerebecki> |
reloading zuul for 6486889..700f380 |
[releng] |
15:42 |
<krenair@tin> |
Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/232241/ (duration: 00m 13s) |
[production] |
15:08 |
<thcipriani@tin> |
Synchronized wmf-config/CommonSettings.php: SWAT: Remove extra transcode enablings [[gerrit:232228]] (duration: 00m 13s) |
[production] |
15:04 |
<andrewbogott> |
rebooting labvirt1006 |
[production] |
13:57 |
<valhallasw`cloud> |
same issue seems to happen with the other hosts: tools-exec-1401.tools.eqiad.wmflabs vs tools-exec-1401.eqiad.wmflabs and tools-exec-catscan.tools.eqiad.wmflabs vs tools-exec-catscan.eqiad.wmflabs. |
[tools] |
13:55 |
<valhallasw`cloud> |
no, wait, that's ''tools-webgrid-lighttpd-1411.eqiad.wmflabs'', not the actual host ''tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs''. We should fix that dns mess as well. |
[tools] |
13:54 |
<valhallasw`cloud> |
tried to restart gridengine-exec on tools-exec-1401, no effect. tools-webgrid-lighttpd-1411 also just went into 'au' state. |
[tools] |
13:47 |
<valhallasw`cloud> |
that brought tools-exec-1403, tools-exec-1406 and tools-webgrid-generic-1402 back up, tools-exec-1401 and tools-exec-catscan are still in 'au' state |
[tools] |
13:46 |
<valhallasw`cloud> |
starting gridengine-exec on hosts with queues in 'au' (=alarm, unknown) state using <code>for i in $(qstat -f -xml | grep "<state>au" -B 6 | grep "<name>" | cut -d'@' -f2 | cut -d. -f1); do echo $i; ssh $i sudo service gridengine-exec start; done</code> |
[tools] |
08:37 |
<valhallasw`cloud> |
sudo service gridengine-exec start on tools-webgrid-lighttpd-1404.eqiad.wmflabs" tools-webgrid-lighttpd-1406.eqiad.wmflabs" tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" |
[tools] |
08:33 |
<valhallasw`cloud> |
tools-webgrid-lighttpd-1403.eqiad.wmflabs, tools-webgrid-lighttpd-1404.eqiad.wmflabs and tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs are all broken (queue dropped because it is temporarily not available) |
[tools] |
08:30 |
<valhallasw`cloud> |
hostname mismatch: host is called tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs in config, but it was named tools-webgrid-lighttpd-1411.eqiad.wmflabs in the hostgroup config |
[tools] |
08:21 |
<valhallasw`cloud> |
still sudo qmod -e "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" -> invalid queue "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" |
[tools] |
08:20 |
<valhallasw`cloud> |
sudo qconf -mhgrp "@webgrid", added tools-webgrid-lighttpd-1411.eqiad.wmflabs |
[tools] |
08:18 |
<_joe_> |
reimaging mw1152 |
[production] |
08:14 |
<godog> |
restart cassandra on restbase100[569] to pick up latest openjdk |
[production] |
08:14 |
<valhallasw`cloud> |
and the hostgroup @webgrid doesn't even exist? (╯°□°)╯︵ ┻━┻ |
[tools] |
08:10 |
<valhallasw`cloud> |
/var/lib/gridengine/etc/queues/webgrid-lighttpd does not seem to be the correct configuration as the current config refers to '@webgrid' as host list. |
[tools] |
08:07 |
<valhallasw`cloud> |
sudo qconf -Ae /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs -> root@tools-bastion-01.eqiad.wmflabs added "tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" to exechost list |
[tools] |
08:06 |
<valhallasw`cloud> |
ok, success. /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs now exists. Do I still have to add it manually to the grid? I suppose so. |
[tools] |
08:04 |
<_joe_> |
depooling mw1152 from the imagescalers pool |
[production] |
08:04 |
<valhallasw`cloud> |
installing packages from /data/project/.system/deb-trusty seems to fail. sudo apt-get update helps. |
[tools] |
08:03 |
<godog> |
restart cassandra on restbase100[348] to pick up latest openjdk |
[production] |
08:00 |
<valhallasw`cloud> |
running puppet agent -tv again |
[tools] |
07:55 |
<valhallasw`cloud> |
argh. Disabling toollabs::node::web::generic again and enabling toollabs::node::web::lighttpd |
[tools] |
07:54 |
<valhallasw`cloud> |
various issues such as Error: /Stage[main]/Gridengine::Submit_host/File[/var/lib/gridengine/default/common/accounting]/ensure: change from absent to link failed: Could not set 'link' on ensure: No such file or directory - /var/lib/gridengine/default/common at 17:/etc/puppet/modules/gridengine/manifests/submit_host.pp; probably an ordering issue in |
[tools] |
07:53 |
<valhallasw`cloud> |
Setting up adminbot (1.7.8) ... chmod: cannot access '/usr/lib/adminbot/README': No such file or directory --- ran sudo touch /usr/lib/adminbot/README |
[tools] |
07:37 |
<valhallasw`cloud> |
applying role::labs::tools::compute and toollabs::node::web::generic to \tools-webgrid-lighttpd-1411 |
[tools] |
07:31 |
<valhallasw`cloud> |
reading puppet suggests I should qconf -ah /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs but that file is missing? |
[tools] |