2018-01-18
§
|
19:11 |
<joal> |
Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05 |
[analytics] |
19:10 |
<joal> |
Add fake data to cassandra to silent alarms (Thanks again ema) |
[analytics] |
18:56 |
<joal> |
Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload |
[analytics] |
15:21 |
<mforns> |
refinery deployment using scap and then deploying onto hdfs finished |
[analytics] |
15:07 |
<mforns> |
starting refinery deployment |
[analytics] |
12:43 |
<elukey> |
piwik on bohrium re-enabled |
[analytics] |
12:40 |
<elukey> |
set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot) |
[analytics] |
09:38 |
<elukey> |
reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites |
[analytics] |
09:37 |
<elukey> |
resumed druid hourly index jobs via hue and restored pivot's configuration |
[analytics] |
09:21 |
<elukey> |
reboot druid1001 for kernel upgrades |
[analytics] |
09:00 |
<elukey> |
suspended hourly druid batch index jobs via Hue |
[analytics] |
08:58 |
<elukey> |
temporarily set druid1002 in superset's druid cluster config (via UI) |
[analytics] |
08:53 |
<elukey> |
temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted) |
[analytics] |
08:52 |
<elukey> |
disable druid1001's middlemanager as prep step for reboot |
[analytics] |
07:11 |
<elukey> |
re-run webrequest-load-wf-misc-2018-1-18-3 via Hue |
[analytics] |
2018-01-17
§
|
17:33 |
<elukey> |
killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) |
[analytics] |
17:29 |
<elukey> |
restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) |
[analytics] |
16:24 |
<elukey> |
re-run all the pageview-druid-hourly failed jobs via Hue |
[analytics] |
14:42 |
<elukey> |
restart druid middlemanager on druid1003 as attempt to unblock realtime streaming |
[analytics] |
14:21 |
<elukey> |
forced kill of banner impression data streaming job to get it restarted |
[analytics] |
11:44 |
<elukey> |
re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) |
[analytics] |
11:44 |
<elukey> |
restart druid middlemanager on druid1002 |
[analytics] |
10:38 |
<elukey> |
stopped all crons on hadoop-coordinator-1 |
[analytics] |
10:37 |
<elukey> |
re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) |
[analytics] |
10:22 |
<elukey> |
reboot druid1002 for kernel upgrades |
[analytics] |
09:53 |
<elukey> |
disable druid middlemanager on druid1002 as prep step for reboot |
[analytics] |
09:46 |
<elukey> |
rebooted analytics1003 |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[analytics] |
08:53 |
<elukey> |
disabled camus as prep step for analytics1003 reboot |
[analytics] |
2018-01-11
§
|
22:35 |
<ottomata> |
restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/403774 |
[analytics] |
22:04 |
<ottomata> |
restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403762/ |
[analytics] |
20:57 |
<ottomata> |
restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403753/ |
[analytics] |
17:37 |
<joal> |
Kill manual banner-streaming job to see it restarted by cron |
[analytics] |
17:11 |
<ottomata> |
restart kafka on kafka-jumbo1003 |
[analytics] |
17:08 |
<ottomata> |
restart kafka on kafka-jumbo1001...something is not right with my certpath change yesterday |
[analytics] |
14:46 |
<joal> |
Deploy refinery onto HDFS |
[analytics] |
14:33 |
<joal> |
Deploy refinery with Scap |
[analytics] |
14:07 |
<joal> |
Manually restarting banner streaming job to prevent alerting |
[analytics] |
13:23 |
<joal> |
Killing banner-streaming job to have it auto-restarted from cron |
[analytics] |
11:45 |
<elukey> |
re-run webrequest-load-wf-text-2018-1-11-8 (failed due to reboots) |
[analytics] |