2016-01-12
§
|
08:37 |
<paravoid> |
ms-be1002: echo b > /proc/sysrq-trigger, kernel misbehaving and unrecoverable (out of kernel memory/XFS issues) |
[production] |
07:38 |
<paravoid> |
cr2-eqiad: reenable BGP peerings with GTT |
[production] |
05:31 |
<paravoid> |
rm CirrusSearchRequests.log-201510*.gz on fluorine (saving ~200G) |
[production] |
04:07 |
<paravoid> |
cleaning up elastic1006's /var/log from old logs |
[production] |
03:59 |
<paravoid> |
reenabling puppet on sca1001/2; no reason was left |
[production] |
02:33 |
<l10nupdate@tin> |
l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 12 02:33:00 UTC 2016 (duration 6m 55s) |
[production] |
02:26 |
<mwdeploy@tin> |
mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 47s) |
[production] |
00:46 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: rv 443026e3ad18934dd0017a258673d88104cf6b5e (duration: 00m 29s) |
[production] |
00:32 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258670/ (duration: 00m 30s) |
[production] |
00:29 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258672/ (duration: 00m 30s) |
[production] |
00:25 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258453/ (duration: 00m 30s) |
[production] |
00:18 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/258444/ (duration: 00m 30s) |
[production] |
00:14 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/255361/ (duration: 00m 30s) |
[production] |
00:10 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/244140/ (duration: 00m 30s) |
[production] |
00:09 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/244140/ (duration: 00m 30s) |
[production] |
00:06 |
<krenair@tin> |
krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/260242/ (duration: 00m 30s) |
[production] |
2016-01-11
§
|
22:52 |
<jzerebecki@tin> |
jzerebecki@tin Synchronized wmf-config/throttle.php: deploying https://gerrit.wikimedia.org/r/#/c/263427/ (duration: 00m 30s) |
[production] |
22:48 |
<YuviPanda> |
restart eventlogging_synch on dbstore1002 |
[production] |
22:47 |
<jzerebecki@tin> |
jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dispatchChanges.php: restoring truncated Wikidata dispatchChanges.php to let dispatchers run again (duration: 00m 30s) |
[production] |
22:46 |
<mutante> |
restbase1004, restbase2002, restbase2005 - manually install nodejs |
[production] |
22:45 |
<jzerebecki@tin> |
jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo: deploying https://gerrit.wikimedia.org/r/#/c/253898/ with dispatchChanges.php still truncated (duration: 00m 33s) |
[production] |
22:40 |
<mutante> |
restbase1001 - apt-get install nodejs |
[production] |
22:40 |
<jzerebecki> |
dispatchChanges.php killed on terbium |
[production] |
22:38 |
<jzerebecki@tin> |
jzerebecki@tin Synchronized php-1.27.0-wmf.9/extensions/Wikidata/extensions/Wikibase/repo/maintenance/dispatchChanges.php: truncating Wikidata dispatchChanges.php to stop dispatchers as preparation for https://gerrit.wikimedia.org/r/#/c/253898/ (duration: 00m 31s) |
[production] |
22:24 |
<hashar> |
Deleting old references on Zuul-merger for mediawiki/core : <tt>/usr/share/python/zuul/bin/python /home/hashar/zuul-clear-refs.py --until 15 /srv/ssd/zuul/git/mediawiki/core </tt> |
[releng] |
22:21 |
<hashar> |
gallium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune |
[releng] |
22:21 |
<hashar> |
scandium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune |
[releng] |
22:19 |
<valhallasw`cloud> |
reset maxujobs 0->128, job_load_adjustments none->np_load_avg=0.50, load_ad... -> 0:7:30 |
[tools] |
22:12 |
<YuviPanda> |
restarted gridengine master again |
[tools] |
22:07 |
<valhallasw`cloud> |
set job_load_adjustments from np_load_avg=0.50 to none and load_adjustment_decay_time to 0:0:0 |
[tools] |
22:05 |
<valhallasw`cloud> |
set maxujobs back to 0, but doesn't help |
[tools] |
21:57 |
<valhallasw`cloud> |
reset to 7:30 |
[tools] |
21:57 |
<valhallasw`cloud> |
that cleared the measure, but jobs still not starting. Ugh! |
[tools] |
21:55 |
<valhallasw`cloud> |
set job_load_adjustments_decay_time = 0:0:0 |
[tools] |
21:45 |
<YuviPanda> |
restarted gridengine master |
[tools] |
21:43 |
<valhallasw`cloud> |
qstat -j <jobid> shows all queues overloaded; seems to have started just after a load test for the new maxujobs setting |
[tools] |
21:42 |
<valhallasw`cloud> |
resetting to 0:7:30, as it's not having the intended effect |
[tools] |
21:41 |
<valhallasw`cloud> |
currently 353 jobs in qw state |
[tools] |
21:40 |
<valhallasw`cloud> |
that's load_adjustment_decay_time |
[tools] |
21:40 |
<valhallasw`cloud> |
temporarily sudo qconf -msconf to 0:0:1 |
[tools] |
21:19 |
<papaul> |
pc200[4-6] - signing puppet certs, salt-key, initial run |
[production] |
21:13 |
<subbu> |
finished deploying parsoid sha 07494cf2 |
[production] |
21:06 |
<papaul> |
installing OS on pc200[4-6] |
[production] |
21:06 |
<subbu> |
synced new code; restarted parsoid on wtp1003 as a canary |
[production] |
21:02 |
<subbu> |
starting parsoid deploy |
[production] |
19:59 |
<YuviPanda> |
Set maxujobs (max concurrent jobs per user) on gridengine to 128 |
[tools] |
18:52 |
<RobH> |
rt.w.o cert expired and its replacement will be later today (rt is internal ops only tool) |
[production] |
18:36 |
<RobH> |
tendril cert updated and neon returned to normal service |
[production] |
18:30 |
<ori> |
Restarting HHVM on all job runners, to vacate memory now that the cause of the leak appears to have subsided.(T122069) |
[production] |
18:24 |
<RobH> |
tendril updating ssl cert on neon, https may flap for a second (this is on neon, so icinga https portal may also flap) |
[production] |