2014-10-31
§
|
20:46 |
<aaron> |
Synchronized php-1.25wmf5/includes/GlobalFunctions.php: 721435c3a6c8f7c728d3fa8ec34abb0f2ef7543d (duration: 00m 07s) |
[production] |
20:36 |
<aaron> |
Synchronized php-1.25wmf6/includes/GlobalFunctions.php: 04c35b2ca42d7a186278882763eb853552d8441c (duration: 00m 04s) |
[production] |
18:36 |
<ejegg> |
disabled recurring globalcollect |
[production] |
18:03 |
<maxsem> |
Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/170358 (duration: 00m 04s) |
[production] |
15:25 |
<demon> |
Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s) |
[production] |
14:59 |
<demon> |
Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s) |
[production] |
14:59 |
<demon> |
Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 04s) |
[production] |
14:56 |
<_joe_> |
rotated logs on ocg1001, restarted both ocg and rsyslog |
[production] |
14:23 |
<akosiaris> |
update DNS/NTP settings, add codfw on nas1001-a,b |
[production] |
13:27 |
<manybubbles> |
reenable was uneventful. good news. |
[production] |
13:25 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus everywhere where it has been after the outage has passed (duration: 00m 03s) |
[production] |
12:41 |
<manybubbles> |
reenabled cirrus as betafeature - no spike in error logs |
[production] |
12:41 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus as betafeature everywhere (duration: 00m 05s) |
[production] |
12:37 |
<manybubbles> |
cirrus is working on test2wiki - we look to be recovered save for some loss of redundancy |
[production] |
12:36 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus on testwiki (duration: 00m 04s) |
[production] |
12:32 |
<manybubbles> |
Synchronized wmf-config/: Disable Cirrus accelerated regexes as we *think* they might be causing outages (duration: 00m 04s) |
[production] |
12:31 |
<manybubbles> |
restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way |
[production] |
12:26 |
<manybubbles> |
restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down). |
[production] |
11:46 |
<manybubbles> |
heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction. |
[production] |
11:30 |
<manybubbles> |
restarting gmond on elasticsearch nodes so I can get a clearer picture of them |
[production] |
11:24 |
<oblivian> |
Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s) |
[production] |
10:52 |
<godog> |
restarting elasticsearch on elastic1031, heap exhausted at 30G |
[production] |
01:14 |
<springle> |
db1040 dberror spam is https://gerrit.wikimedia.org/r/#/c/169964/ only jobrunners affected, annoying but not critical |
[production] |
2014-10-30
§
|
23:56 |
<awight> |
update civicrm from 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005 to f47ed6f7e55946388db1dde787ca458c27a57c5a |
[production] |
23:08 |
<demon> |
Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s) |
[production] |
23:08 |
<demon> |
Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 05s) |
[production] |
19:02 |
<cmjohnson> |
powering off elastic1009-1002 to replace ssds |
[production] |
18:35 |
<mutante> |
restarting nginx on toollabs webproxy |
[production] |
18:35 |
<manybubbles> |
unbanning elastic1006 now that it is proplery configured |
[production] |
17:54 |
<_joe_> |
syncronized downsizing to 5% |
[production] |
17:54 |
<oblivian> |
Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 06s) |
[production] |
17:42 |
<_joe_> |
rolling restarted hhvm appservers |
[production] |
17:38 |
<hashar> |
Zuul seems to be happy. Reverted my lame patch to send Cache-Control headers since we have a cache breaker it is not needed. |
[production] |
17:21 |
<bd808> |
10.64.16.29 is db1040 in the s4 pool |
[production] |
17:18 |
<bd808> |
"Connection error: Unknown error (10.64.16.29)" 1052 in last 5m; 2877 in last 15m |
[production] |
17:16 |
<hashar> |
Upgrading Zuul to have the status page emit a Cache-Control header {{bug|72766}} wmf-deploy-20141030-1..wmf-deploy-20141030-2 |
[production] |
17:11 |
<bd808> |
Upgraded kibana to v3.1.1 again. Better testing now that logstash is working. |
[production] |
17:01 |
<bd808> |
Logs on logstash1003 showed "Failed to flush outgoing items <Errno::EBADF: Bad file descriptor - Bad file descriptor>" on shutdown. Maybe something not quite right about elasticsearch_http plugin? |
[production] |
17:00 |
<awight> |
Synchronized php-1.25wmf6/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 10s) |
[production] |
16:59 |
<bd808> |
restarted logstash on logstash1003. No events logged since 00:00Z |
[production] |
16:58 |
<awight> |
Synchronized php-1.25wmf5/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 11s) |
[production] |
16:58 |
<bd808> |
restarted logstash on logstash1002. No events logged since 00:00Z |
[production] |
16:58 |
<bd808> |
restarted logstash on logstash1001. No events logged since 00:00Z |
[production] |
16:55 |
<akosiaris> |
uploaded php5_5.3.10-1ubuntu3.15+wmf1 on apt.wikimedia.org |
[production] |
16:46 |
<bd808> |
Reverted kibana to e317bc6 |
[production] |
16:44 |
<oblivian> |
Synchronized wmf-config/CommonSettings.php: Serving 15% of anons with HHVM (ludicrous speed!) (duration: 00m 16s) |
[production] |
16:38 |
<bd808> |
Upgraded kibana to v3.1.1 via Trebuchet |
[production] |