analytics SAL

2051-2100 of 3166 results (6ms)

2018-02-05 §
15:51	<elukey>	live hacked deployment-eventlog02's /srv/deployment/eventlogging/analytics/eventlogging/handlers.py to add poll(0) to the confluent kafka producer - T185291	[analytics]
11:03	<elukey>	restart eventlogging/forwarder legacy-zmq on eventlog1001 due to slow memory leak over time (cached memory down to zero)	[analytics]
2018-02-02 §
17:09	<joal>	Webrequest upload 2018-02-02 hours 9 and 11 dataloss warning have been checked - They are false positive	[analytics]
09:56	<joal>	unique_devices-per_project_family-monthly-wf-2018-1 after failure	[analytics]
2018-02-01 §
17:00	<ottomata>	killing stuck JsonRefine eventlogging analytics job application_1515441536446_52892, not sure why this is stuck.	[analytics]
14:06	<joal>	Dataloss alerts for upload 2018-02-01 hours 1, 2, 3 and 5 were false positives	[analytics]
12:17	<joal>	Restart cassandra monthly bundle after January deploy	[analytics]
2018-01-23 §
20:10	<ottomata>	hdfs dfs -chmod 775 /wmf/data/archive/mediacounts/daily/2018 for T185419	[analytics]
09:26	<joal>	Dataloss warning for upload and text 2018-01-23:06 is confirmed to be false positive	[analytics]
2018-01-22 §
17:36	<joal>	Kill-Restart clickstream oozie job after deploy	[analytics]
17:12	<joal>	deploying refinery onto HDFS	[analytics]
17:12	<joal>	Refinery deployed from scap	[analytics]
2018-01-18 §
19:11	<joal>	Kill-Restart coord_pageviews_top_bycountry_monthly ooie job from 2015-05	[analytics]
19:10	<joal>	Add fake data to cassandra to silent alarms (Thanks again ema)	[analytics]
18:56	<joal>	Truncating table "local_group_default_T_top_bycountry"."data" in cassandra before reload	[analytics]
15:21	<mforns>	refinery deployment using scap and then deploying onto hdfs finished	[analytics]
15:07	<mforns>	starting refinery deployment	[analytics]
12:43	<elukey>	piwik on bohrium re-enabled	[analytics]
12:40	<elukey>	set piwik in readonly mode and stopped mysql on bohrium (prep step for reboot)	[analytics]
09:38	<elukey>	reboot thorium (analytics webserver) for security upgrade - This maintenance will cause temporary unavailability of the Analytics websites	[analytics]
09:37	<elukey>	resumed druid hourly index jobs via hue and restored pivot's configuration	[analytics]
09:21	<elukey>	reboot druid1001 for kernel upgrades	[analytics]
09:00	<elukey>	suspended hourly druid batch index jobs via Hue	[analytics]
08:58	<elukey>	temporarily set druid1002 in superset's druid cluster config (via UI)	[analytics]
08:53	<elukey>	temporarily point pivot's configuration to druid1002 (druid1001 needs to be rebooted)	[analytics]
08:52	<elukey>	disable druid1001's middlemanager as prep step for reboot	[analytics]
07:11	<elukey>	re-run webrequest-load-wf-misc-2018-1-18-3 via Hue	[analytics]
2018-01-17 §
17:33	<elukey>	killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present)	[analytics]
17:29	<elukey>	restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task)	[analytics]
16:24	<elukey>	re-run all the pageview-druid-hourly failed jobs via Hue	[analytics]
14:42	<elukey>	restart druid middlemanager on druid1003 as attempt to unblock realtime streaming	[analytics]
14:21	<elukey>	forced kill of banner impression data streaming job to get it restarted	[analytics]
11:44	<elukey>	re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot)	[analytics]
11:44	<elukey>	restart druid middlemanager on druid1002	[analytics]
10:38	<elukey>	stopped all crons on hadoop-coordinator-1	[analytics]
10:37	<elukey>	re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot)	[analytics]
10:22	<elukey>	reboot druid1002 for kernel upgrades	[analytics]
09:53	<elukey>	disable druid middlemanager on druid1002 as prep step for reboot	[analytics]
09:46	<elukey>	rebooted analytics1003	[analytics]
09:46	<elukey>	removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?)	[analytics]
08:53	<elukey>	disabled camus as prep step for analytics1003 reboot	[analytics]
2018-01-15 §
13:39	<elukey>	stop eventlogging and reboot eventlog1001 for kernel updates	[analytics]
09:58	<elukey>	rolling reboots of aqs hosts (1005->1009) for kernel updates	[analytics]
09:11	<elukey>	reboot aqs1004 for kernel updates	[analytics]
2018-01-12 §
13:03	<joal>	Rerun webrequest-load-wf-text-2018-1-12-9	[analytics]
13:02	<joal>	Rerun webrequest-load-wf-upload-2018-1-12-9	[analytics]
10:33	<elukey>	reboot analytics1066->69 for kernel updates	[analytics]
09:07	<elukey>	reboot analytics1063->65 for kernel updates	[analytics]
2018-01-11 §
22:35	<ottomata>	restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/403774	[analytics]
22:04	<ottomata>	restarting kafka-jumbo brokers to apply https://gerrit.wikimedia.org/r/#/c/403762/	[analytics]