5351-5400 of 10000 results (32ms)
2014-10-31 §
12:31 <manybubbles> restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way [production]
12:26 <manybubbles> restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down). [production]
11:46 <manybubbles> heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction. [production]
11:30 <manybubbles> restarting gmond on elasticsearch nodes so I can get a clearer picture of them [production]
11:24 <oblivian> Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s) [production]
10:52 <godog> restarting elasticsearch on elastic1031, heap exhausted at 30G [production]
01:14 <springle> db1040 dberror spam is https://gerrit.wikimedia.org/r/#/c/169964/ only jobrunners affected, annoying but not critical [production]
2014-10-30 §
23:56 <awight> update civicrm from 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005 to f47ed6f7e55946388db1dde787ca458c27a57c5a [production]
23:08 <demon> Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s) [production]
23:08 <demon> Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 05s) [production]
19:02 <cmjohnson> powering off elastic1009-1002 to replace ssds [production]
18:35 <mutante> restarting nginx on toollabs webproxy [production]
18:35 <manybubbles> unbanning elastic1006 now that it is proplery configured [production]
17:54 <_joe_> syncronized downsizing to 5% [production]
17:54 <oblivian> Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 06s) [production]
17:42 <_joe_> rolling restarted hhvm appservers [production]
17:38 <hashar> Zuul seems to be happy. Reverted my lame patch to send Cache-Control headers since we have a cache breaker it is not needed. [production]
17:21 <bd808> 10.64.16.29 is db1040 in the s4 pool [production]
17:18 <bd808> "Connection error: Unknown error (10.64.16.29)" 1052 in last 5m; 2877 in last 15m [production]
17:16 <hashar> Upgrading Zuul to have the status page emit a Cache-Control header {{bug|72766}} wmf-deploy-20141030-1..wmf-deploy-20141030-2 [production]
17:11 <bd808> Upgraded kibana to v3.1.1 again. Better testing now that logstash is working. [production]
17:01 <bd808> Logs on logstash1003 showed "Failed to flush outgoing items <Errno::EBADF: Bad file descriptor - Bad file descriptor>" on shutdown. Maybe something not quite right about elasticsearch_http plugin? [production]
17:00 <awight> Synchronized php-1.25wmf6/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 10s) [production]
16:59 <bd808> restarted logstash on logstash1003. No events logged since 00:00Z [production]
16:58 <awight> Synchronized php-1.25wmf5/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 11s) [production]
16:58 <bd808> restarted logstash on logstash1002. No events logged since 00:00Z [production]
16:58 <bd808> restarted logstash on logstash1001. No events logged since 00:00Z [production]
16:55 <akosiaris> uploaded php5_5.3.10-1ubuntu3.15+wmf1 on apt.wikimedia.org [production]
16:46 <bd808> Reverted kibana to e317bc6 [production]
16:44 <oblivian> Synchronized wmf-config/CommonSettings.php: Serving 15% of anons with HHVM (ludicrous speed!) (duration: 00m 16s) [production]
16:38 <bd808> Upgraded kibana to v3.1.1 via Trebuchet [production]
16:38 <hashar> Zuul status page is freezing because the status.json is being cached :-/ [production]
16:31 <awight> Synchronized php-1.25wmf6/extensions/CentralNotice: push CentralNotice updates (duration: 00m 09s) [production]
16:28 <awight> Synchronized php-1.25wmf5/extensions/CentralNotice: push CentralNotice updates (duration: 00m 11s) [production]
16:22 <manybubbles> moving shards off of elastic1003 and elastic1006 so they can be restarted. elastic1003 need hyperthreading and elastic1006 needs noatime. [production]
16:17 <cmjohnson> powering off elastic1015-16 to replace ssds [production]
16:04 <hashar> restarted Zuul with upgraded version ( wmf-deploy-20140924-1..wmf-deploy-20141030-1 ) [production]
16:03 <hashar> Stopping zuul [production]
16:00 <hoo> Synchronized wmf-config/CommonSettings.php: Fix oauthadmin (duration: 00m 09s) [production]
15:43 <hashar> Going to upgrade Zuul and monitor the result over the next hour. [production]
15:39 <ottomata> starting to reimage mw1032 [production]
15:29 <oblivian> Synchronized wmf-config/CommonSettings.php: Serving 10% of anons with HHVM (duration: 00m 06s) [production]
15:22 <reedy> Synchronized docroot and w: Fix dbtree caching (duration: 00m 15s) [production]
15:13 <akosiaris> upgrading PHP on mw1113 to php5_5.3.10-1ubuntu3.15+wmf1 [production]
15:07 <manybubbles> moving shards off of elastic1015 and elastic1016 so we can replace their hard drives/turn on hyper threading [production]
15:07 <marktraceur> Synchronized php-1.25wmf6/extensions/Wikidata/: [SWAT] [wmf6] Fix edit link for aliases (duration: 00m 12s) [production]
14:37 <cmjohnson> powering down elastic1003-1006 to replace ssds [production]
14:33 <_joe_> pooling mw1031/2 in the hhvm appservers pool [production]
12:51 <_joe_> rebooting mw1030 and mw1031 to use the updated kernel [production]
12:48 <akosiaris> enabled puppet on uranium [production]