2018-01-18
§
|
19:11 |
<joal> |
Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05 |
[analytics] |
19:10 |
<joal> |
Add fake data to cassandra to silent alarms (Thanks again ema) |
[analytics] |
18:56 |
<joal> |
Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload |
[analytics] |
15:21 |
<mforns> |
refinery deployment using scap and then deploying onto hdfs finished |
[analytics] |
15:07 |
<mforns> |
starting refinery deployment |
[analytics] |
12:43 |
<elukey> |
piwik on bohrium re-enabled |
[analytics] |
12:40 |
<elukey> |
set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot) |
[analytics] |
09:38 |
<elukey> |
reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites |
[analytics] |
09:37 |
<elukey> |
resumed druid hourly index jobs via hue and restored pivot's configuration |
[analytics] |
09:21 |
<elukey> |
reboot druid1001 for kernel upgrades |
[analytics] |
09:00 |
<elukey> |
suspended hourly druid batch index jobs via Hue |
[analytics] |
08:58 |
<elukey> |
temporarily set druid1002 in superset's druid cluster config (via UI) |
[analytics] |
08:53 |
<elukey> |
temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted) |
[analytics] |
08:52 |
<elukey> |
disable druid1001's middlemanager as prep step for reboot |
[analytics] |
07:11 |
<elukey> |
re-run webrequest-load-wf-misc-2018-1-18-3 via Hue |
[analytics] |
2018-01-17
§
|
17:33 |
<elukey> |
killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) |
[analytics] |
17:29 |
<elukey> |
restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) |
[analytics] |
16:24 |
<elukey> |
re-run all the pageview-druid-hourly failed jobs via Hue |
[analytics] |
14:42 |
<elukey> |
restart druid middlemanager on druid1003 as attempt to unblock realtime streaming |
[analytics] |
14:21 |
<elukey> |
forced kill of banner impression data streaming job to get it restarted |
[analytics] |
11:44 |
<elukey> |
re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) |
[analytics] |
11:44 |
<elukey> |
restart druid middlemanager on druid1002 |
[analytics] |
10:38 |
<elukey> |
stopped all crons on hadoop-coordinator-1 |
[analytics] |
10:37 |
<elukey> |
re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) |
[analytics] |
10:22 |
<elukey> |
reboot druid1002 for kernel upgrades |
[analytics] |
09:53 |
<elukey> |
disable druid middlemanager on druid1002 as prep step for reboot |
[analytics] |
09:46 |
<elukey> |
rebooted analytics1003 |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[analytics] |
08:53 |
<elukey> |
disabled camus as prep step for analytics1003 reboot |
[analytics] |