2014-11-03
§
|
23:01 |
<cscott> |
reconfigured OCG logstash path to use bunyan. The _type field is currently missing (used to be "OfflineContentGenerator"). Will fix tomorrow. |
[production] |
22:32 |
<cscott> |
updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b |
[production] |
21:56 |
<ejegg> |
enabled recurring globalcollect processor |
[production] |
20:49 |
<maxsem> |
Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/#/c/170453/ (duration: 00m 03s) |
[production] |
20:23 |
<maxsem> |
Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on enwiki (duration: 00m 04s) |
[production] |
19:51 |
<maxsem> |
Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on test and test2 (duration: 00m 04s) |
[production] |
19:43 |
<maxsem> |
Finished scap: Build localization cache for WikiGrok (duration: 35m 09s) |
[production] |
19:08 |
<maxsem> |
Started scap: Build localization cache for WikiGrok |
[production] |
18:55 |
<awight> |
restarting fredge consumer |
[production] |
18:09 |
<awight> |
restarting donations queue consumer |
[production] |
18:09 |
<awight> |
update crm from f47ed6f7e55946388db1dde787ca458c27a57c5a to b8a1fa98b5d9252d708090c99b61fd22ebe8d2be |
[production] |
16:57 |
<akosiaris> |
repool wtp1024 at regular weight |
[production] |
16:34 |
<_joe_> |
rolling-restarting hhvm appservers |
[production] |
16:25 |
<godog> |
reboot ms-be2007, disk replaced but no corresponding raid0 LD |
[production] |
16:22 |
<andrewbogott> |
added yuvi to 'Ops' ldap group |
[production] |
16:03 |
<anomie> |
Synchronized docroot and w: (no message) (duration: 00m 10s) |
[production] |
14:38 |
<akosiaris> |
wtp1024 re-installed as trusty |
[production] |
14:38 |
<akosiaris> |
repool wtp1024 with a weight of 1 instead of 15 for now |
[production] |
13:18 |
<akosiaris> |
depool wtp1024.eqiad.wmnet in preparation for reimaging to trusty |
[production] |
11:26 |
<akosiaris> |
disable puppet on labsdb1004, labsdb1005 for postgresql reinitialization |
[production] |
2014-10-31
§
|
20:46 |
<aaron> |
Synchronized php-1.25wmf5/includes/GlobalFunctions.php: 721435c3a6c8f7c728d3fa8ec34abb0f2ef7543d (duration: 00m 07s) |
[production] |
20:36 |
<aaron> |
Synchronized php-1.25wmf6/includes/GlobalFunctions.php: 04c35b2ca42d7a186278882763eb853552d8441c (duration: 00m 04s) |
[production] |
18:36 |
<ejegg> |
disabled recurring globalcollect |
[production] |
18:03 |
<maxsem> |
Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/170358 (duration: 00m 04s) |
[production] |
15:25 |
<demon> |
Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s) |
[production] |
14:59 |
<demon> |
Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s) |
[production] |
14:59 |
<demon> |
Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 04s) |
[production] |
14:56 |
<_joe_> |
rotated logs on ocg1001, restarted both ocg and rsyslog |
[production] |
14:23 |
<akosiaris> |
update DNS/NTP settings, add codfw on nas1001-a,b |
[production] |
13:27 |
<manybubbles> |
reenable was uneventful. good news. |
[production] |
13:25 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus everywhere where it has been after the outage has passed (duration: 00m 03s) |
[production] |
12:41 |
<manybubbles> |
reenabled cirrus as betafeature - no spike in error logs |
[production] |
12:41 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus as betafeature everywhere (duration: 00m 05s) |
[production] |
12:37 |
<manybubbles> |
cirrus is working on test2wiki - we look to be recovered save for some loss of redundancy |
[production] |
12:36 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: reenable cirrus on testwiki (duration: 00m 04s) |
[production] |
12:32 |
<manybubbles> |
Synchronized wmf-config/: Disable Cirrus accelerated regexes as we *think* they might be causing outages (duration: 00m 04s) |
[production] |
12:31 |
<manybubbles> |
restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way |
[production] |
12:26 |
<manybubbles> |
restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down). |
[production] |
11:46 |
<manybubbles> |
heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction. |
[production] |
11:30 |
<manybubbles> |
restarting gmond on elasticsearch nodes so I can get a clearer picture of them |
[production] |
11:24 |
<oblivian> |
Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s) |
[production] |
10:52 |
<godog> |
restarting elasticsearch on elastic1031, heap exhausted at 30G |
[production] |
01:14 |
<springle> |
db1040 dberror spam is https://gerrit.wikimedia.org/r/#/c/169964/ only jobrunners affected, annoying but not critical |
[production] |