9801-9850 of 10000 results (61ms)
2015-08-31 ยง
23:25 <ebernhardson@tin> Synchronized wmf-config/InitialiseSettings.php: revert update for cirrussearch experimental suggestions api (duration: 00m 12s) [production]
23:21 <ebernhardson@tin> Synchronized wmf-config/InitialiseSettings.php: update config of cirrussearch experimental suggestions api (duration: 00m 12s) [production]
22:45 <chasemp> disabled puppet on elastic hosts temporarily to safely roll out fw change. elastic seems to have not taken it well and I'm holding for green cluster state. [production]
21:21 <valhallasw`cloud> webservice: error: argument server: invalid choice: 'generic' (choose from 'lighttpd', 'tomcat', 'uwsgi-python', 'nodejs', 'uwsgi-plain') (for tools.javatest) [tools]
21:20 <mutante> installing package upgrades on argon [production]
21:20 <valhallasw`cloud> restarted webservicemonitor [tools]
21:19 <valhallasw`cloud> seems to have some errors in restarting: subprocess.CalledProcessError: Command '['/usr/bin/sudo', '-i', '-u', 'tools.javatest', '/usr/local/bin/webservice', '--release', 'trusty', 'generic', 'restart']' returned non-zero exit status 2 [tools]
21:18 <valhallasw`cloud> running puppet agent -tv on tools-services-02 to make sure webservicemonitor is running [tools]
21:15 <valhallasw`cloud> several webservices seem to actually have not gotten back online?! what on earth is going on. [tools]
21:10 <valhallasw`cloud> some jobs still died (including tools.admin). I'm assuming service.manifest will make sure they start again [tools]
20:58 <ori> imported pybal_1.08_amd64.changes to jessie-wikimedia [production]
20:44 <chasemp> ferm for elastic100[4-7] and adjust ferm to include wikitech source [production]
20:29 <valhallasw`cloud> |sort is not so spread out in terms of affected hosts because a lot of jobs were started on lighttpd-1409 and -1410 around the same time. [tools]
20:25 <valhallasw`cloud> ca 500 jobs @ 5s/job = approx 40 minutes [tools]
20:23 <valhallasw`cloud> doh. accidentally used the wrong file, causing restarts for another few uwsgi hosts. Three more jobs dead *sigh* [tools]
20:21 <valhallasw`cloud> now doing more rescheduling, with 5 sec intervals, on a sorted list to spread load between queues [tools]
20:21 <subbu> deployed parsoid version c3e4df5e [production]
19:36 <valhallasw`cloud> last restarted job is 1423661, rest of them are still in /home/valhallaw/webgrid_jobs [tools]
19:35 <valhallasw`cloud> one per second still seems to make SGE unhappy; there's a whole set of jobs dying, mostly uwsgi? [tools]
19:31 <valhallasw`cloud> https://phabricator.wikimedia.org/T110861 : rescheduling 521 webgrid jobs, at a rate of one per second, while watching the accounting log for issues [tools]
16:22 <godog> depool mw1125 + mw1142 from api, nutcracker client connections exceeded [production]
16:06 <thcipriani@tin> Finished scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] (duration: 27m 07s) [production]
15:59 <jynus> restarting hhvm on mw2187 [production]
15:39 <thcipriani@tin> Started scap: SWAT: Ask the user to log in if the session is lost [[gerrit:234228]] [production]
15:33 <mutante> terbium - Could not find dependent Service[nscd] for File[/etc/ldap/ldap.conf] [production]
15:28 <thcipriani@tin> Synchronized closed-labs.dblist: SWAT: Creating closed-labs.dblist and closing es.wikipedia.beta.wmflabs.org [[gerrit:234594]] (duration: 00m 13s) [production]
15:25 <thcipriani@tin> Synchronized wmf-config/CirrusSearch-common.php: SWAT: Remove files from Commons from search results on wikimediafoundation.org [[gerrit:234040]] (duration: 00m 11s) [production]
15:25 <ottomata> starting varnishkafka instances on frontend caches to produce eventlogging client side events to kafka [production]
15:21 <thcipriani@tin> Synchronized php-1.26wmf20/extensions/Wikidata: SWAT: Update Wikidata - Fix formatting of client edit summaries [[gerrit:234991]] (duration: 00m 21s) [production]
15:16 <thcipriani@tin> Synchronized php-1.26wmf20/extensions/UploadWizard/resources/controller/uw.controller.Step.js: SWAT: Keep the uploads sorted in the order they were created in initially [[gerrit:234553]] (duration: 00m 12s) [production]
15:13 <jzerebecki> did https://phabricator.wikimedia.org/T109007#1537572 [releng]
14:43 <ebernhardson> elasticsearch cluster.routing.allocation.disk.watermark.high set to 75% to force elastic1022 to reduce its disk usage [production]
14:41 <urandom> bouncing Cassandra on restbase1001 to apply temporary GC setting [production]
14:06 <akosiaris> rebooted krypton. was reporting 100% cpu steal time [production]
13:40 <paravoid> running puppet on newly-installed mc2001 [production]
13:40 <paravoid> restarting hhvm on mw1065 [production]
11:10 <moritzm> restart salt-master on palladium [production]
10:45 <paravoid> reenabling asw2-a5-eqiad:xe-0/0/36 (T107635) [production]
10:36 <godog> repool ms-fe1004 [production]
10:32 <godog> repool ms-fe1003 and depool ms-fe1004 for firewall changes [production]
10:19 <godog> update graphite retention policy on files with previous retention and older than 30d T96662 [production]
10:18 <godog> repool ms-fe1002 and depool ms-fe1003 for firewall changes [production]
10:05 <godog> depool ms-fe1002 to apply firewall changes [production]
09:55 <jynus> cloning es1007 mysql data into es1013 (ETA: 5h30m) [production]
09:51 <godog> repool ms-fe1001 [production]
09:35 <godog> depool ms-fe1001 in preparation for ferm changes [production]
09:27 <godog> update graphite retention policy on files with previous retention and older than 60d T96662 [production]
09:25 <jynus@tin> Synchronized wmf-config/db-eqiad.php: Depool es1007 for maintenance (duration: 00m 13s) [production]
08:33 <jynus@tin> Synchronized wmf-config/db-eqiad.php: Depool db1028, return ES servers back from maintenance (duration: 00m 12s) [production]
07:31 <valhallasw`cloud> removed paniclog on tools-submit; probably related to the NFS outage yesterday (although I'm not sure why that would give OOMs) [tools]