2015-08-31
§
|
20:44 |
<chasemp> |
ferm for elastic100[4-7] and adjust ferm to include wikitech source |
[production] |
20:29 |
<valhallasw`cloud> |
|sort is not so spread out in terms of affected hosts because a lot of jobs were started on lighttpd-1409 and -1410 around the same time. |
[tools] |
20:25 |
<valhallasw`cloud> |
ca 500 jobs @ 5s/job = approx 40 minutes |
[tools] |
20:23 |
<valhallasw`cloud> |
doh. accidentally used the wrong file, causing restarts for another few uwsgi hosts. Three more jobs dead *sigh* |
[tools] |
20:21 |
<valhallasw`cloud> |
now doing more rescheduling, with 5 sec intervals, on a sorted list to spread load between queues |
[tools] |
20:21 |
<subbu> |
deployed parsoid version c3e4df5e |
[production] |
19:36 |
<valhallasw`cloud> |
last restarted job is 1423661, rest of them are still in /home/valhallaw/webgrid_jobs |
[tools] |
19:35 |
<valhallasw`cloud> |
one per second still seems to make SGE unhappy; there's a whole set of jobs dying, mostly uwsgi? |
[tools] |
19:31 |
<valhallasw`cloud> |
https://phabricator.wikimedia.org/T110861 : rescheduling 521 webgrid jobs, at a rate of one per second, while watching the accounting log for issues |
[tools] |
16:22 |
<godog> |
depool mw1125 + mw1142 from api, nutcracker client connections exceeded |
[production] |
16:06 |
<thcipriani@tin> |
Finished scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] (duration: 27m 07s) |
[production] |
15:59 |
<jynus> |
restarting hhvm on mw2187 |
[production] |
15:39 |
<thcipriani@tin> |
Started scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] |
[production] |
15:33 |
<mutante> |
terbium - Could not find dependent Service[nscd] for File[/etc/ldap/ldap.conf] |
[production] |
15:28 |
<thcipriani@tin> |
Synchronized closed-labs.dblist: SWAT: Creating closed-labs.dblist and closing es.wikipedia.beta.wmflabs.org [[gerrit:234594]] (duration: 00m 13s) |
[production] |
15:25 |
<thcipriani@tin> |
Synchronized wmf-config/CirrusSearch-common.php: SWAT: Remove files from Commons from search results on wikimediafoundation.org [[gerrit:234040]] (duration: 00m 11s) |
[production] |
15:25 |
<ottomata> |
starting varnishkafka instances on frontend caches to produce eventlogging client side events to kafka |
[production] |
15:21 |
<thcipriani@tin> |
Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - Fix formatting of client edit summaries [[gerrit:234991]] (duration: 00m 21s) |
[production] |
15:16 |
<thcipriani@tin> |
Synchronized php-1.26wmf20/extensions/UploadWizard/resources/controller/uw.controller.Step.js: SWAT: Keep the uploads sorted in the order they were created in initially [[gerrit:234553]] (duration: 00m 12s) |
[production] |
15:13 |
<jzerebecki> |
did https://phabricator.wikimedia.org/T109007#1537572 |
[releng] |
14:43 |
<ebernhardson> |
elasticsearch cluster.routing.allocation.disk.watermark.high set to 75% to force elastic1022 to reduce its disk usage |
[production] |
14:41 |
<urandom> |
bouncing Cassandra on restbase1001 to apply temporary GC setting |
[production] |
14:06 |
<akosiaris> |
rebooted krypton. was reporting 100% cpu steal time |
[production] |
13:40 |
<paravoid> |
running puppet on newly-installed mc2001 |
[production] |
13:40 |
<paravoid> |
restarting hhvm on mw1065 |
[production] |
11:10 |
<moritzm> |
restart salt-master on palladium |
[production] |
10:45 |
<paravoid> |
reenabling asw2-a5-eqiad:xe-0/0/36 (T107635) |
[production] |
10:36 |
<godog> |
repool ms-fe1004 |
[production] |
10:32 |
<godog> |
repool ms-fe1003 and depool ms-fe1004 for firewall changes |
[production] |
10:19 |
<godog> |
update graphite retention policy on files with previous retention and older than 30d T96662 |
[production] |
10:18 |
<godog> |
repool ms-fe1002 and depool ms-fe1003 for firewall changes |
[production] |
10:05 |
<godog> |
depool ms-fe1002 to apply firewall changes |
[production] |
09:55 |
<jynus> |
cloning es1007 mysql data into es1013 (ETA: 5h30m) |
[production] |
09:51 |
<godog> |
repool ms-fe1001 |
[production] |
09:35 |
<godog> |
depool ms-fe1001 in preparation for ferm changes |
[production] |
09:27 |
<godog> |
update graphite retention policy on files with previous retention and older than 60d T96662 |
[production] |
09:25 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Depool es1007 for maintenance (duration: 00m 13s) |
[production] |
08:33 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 12s) |
[production] |
07:31 |
<valhallasw`cloud> |
removed paniclog on tools-submit; probably related to the NFS outage yesterday (although I'm not sure why that would give OOMs) |
[tools] |
04:34 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Mon Aug 31 04:34:14 UTC 2015 (duration 34m 13s) |
[production] |
04:05 |
<bblack> |
disabled ipv6 autoconf on neon, flushed old dynamic addr |
[production] |
02:32 |
<l10nupdate@tin> |
LocalisationUpdate completed (1.26wmf20) at 2015-08-31 02:32:25+00:00 |
[production] |
02:29 |
<l10nupdate@tin> |
Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 42s) |
[production] |