analytics SAL

2101-2150 of 4847 results (23ms)

2020-06-08 §
14:07	<elukey>	move matomo cron archiver to systemd timer archiver (with nagios alarming)	[analytics]
14:02	<elukey>	re-enable timers on an-coord1001	[analytics]
14:01	<elukey>	restart hive/oozie on an-coord1001 for openjdk upgrades	[analytics]
13:42	<elukey>	roll restart kafka jumbo brokers for openjdk upgrades	[analytics]
13:26	<elukey>	stop timers on an-launcher to drain jobs and restart hive/oozie for openjdk upgrades	[analytics]
2020-06-05 §
17:56	<elukey>	roll restart presto server on an-presto* to pick up new openjdk upgrades	[analytics]
16:45	<elukey>	upgrade turnilo to 1.24.0	[analytics]
13:26	<elukey>	reimage druid1006 to debian buster	[analytics]
09:26	<elukey>	roll restart cassandra on AQS to pick up openjdk upgrades	[analytics]
2020-06-04 §
19:12	<elukey>	roll restart of aqs to pick up new druid settings	[analytics]
18:39	<mforns>	deployed wikistats2 2.7.5	[analytics]
13:33	<elukey>	re-enable netflow hive2druid jobs after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602356/	[analytics]
10:56	<elukey>	depooled and reimage druid1004 to Debian Buster (Druid public cluster)	[analytics]
07:31	<elukey>	stop netflow hive2druid timers to do some experiments	[analytics]
06:13	<elukey>	kill application_1589903254658_75731 (druid indexation for netflow still running since 12h ago)	[analytics]
05:36	<elukey>	restart druid middlemanager on druid1002 - strange protobuf warnings, netflow hive2druid indexation job stuck for hours	[analytics]
05:13	<elukey>	reimage druid1003 to Buster	[analytics]
2020-06-03 §
17:10	<elukey>	restart RU jobs after adding memory to an-launcher1001	[analytics]
16:57	<elukey>	reboot an-launcher1001 to get new memory	[analytics]
16:01	<elukey>	stop timers on an-launcher, prep for reboot	[analytics]
09:35	<elukey>	re-run webrequest-druid-hourly-coord 03/06T7 (failed due to druid1002 moving to buster)	[analytics]
08:50	<elukey>	reimage druid1002 to Buster	[analytics]
2020-06-01 §
14:54	<elukey>	stop all timers on an-launcher1001, prep step for reboot	[analytics]
12:54	<elukey>	/user/dedcode/.Trash/* -skipTrash	[analytics]
06:53	<elukey>	re-run virtualpageview-hourly-wf-2020-5-31-19	[analytics]
06:28	<elukey>	temporary stop of all RU jobs on an-launcher1001 to priviledge camus and others	[analytics]
06:03	<elukey>	kill all airflow-related processes on an-launcher1001 - host killing tasks due to OOM	[analytics]
2020-05-30 §
08:15	<elukey>	manual reset-failed of monitor_refine_mediawiki_job_events_failure_flags	[analytics]
2020-05-29 §
13:19	<elukey>	re-run druid webrequest hourly 29/05T11 (failed due to a host reimage in progress)	[analytics]
12:19	<elukey>	reimage druid1001 to Debian Buster	[analytics]
10:05	<elukey>	move el2druid config from druid1001 to an-druid1001	[analytics]
2020-05-28 §
18:31	<milimetric>	after deployment, restarted four oozie jobs with new SLAs and fixed datasets definitions	[analytics]
06:40	<elukey>	slowly restarting all RU units on an-launcher1001	[analytics]
06:32	<elukey>	delete old RU pid files with timestamp May 27 19:00 (scap deployment failed to an-launcher due to disk issues) except ./jobs/reportupdater-queries/pingback/.reportupdater.pid that was working fine	[analytics]
2020-05-27 §
19:53	<joal>	Start pageview-complete dump oozie job after deploy	[analytics]
19:24	<joal>	Deploy refinery onto hdfs	[analytics]
19:22	<joal>	restart failed services on an-launcher1001	[analytics]
19:06	<joal>	Deploy refinery using scap to an-launcher1001 only	[analytics]
18:41	<joal>	Deploying refinery with scap	[analytics]
13:42	<ottomata>	increased Kafka topic retention in jumbo-eqiad to 31 days for (eqiad\|codfw).mediawiki.revision-create - T253753	[analytics]
07:09	<joal>	Rerun webrequest-druid-hourly-wf-2020-5-26-17	[analytics]
07:04	<elukey>	matomo upgraded to 3.13.5 on matomo1001	[analytics]
06:17	<elukey>	superset upgraded to 0.36	[analytics]
05:52	<elukey>	attempt to upgrade Superset to 0.36 - downtime expected	[analytics]
2020-05-24 §
10:04	<elukey>	re-run virtualpageview-hourly 23/05T15 - failed due to a sporadic kerberos/hive issue	[analytics]
2020-05-22 §
09:11	<elukey>	superset upgrade attempt to 0.36 failed due to a db upgrade error (not seen in staging), rollback to 0.35.2	[analytics]
08:15	<elukey>	superset down for maintenance	[analytics]
07:09	<elukey>	add druid100[7,8] to the LVS druid-public-brokers service (serving AQS's traffic)	[analytics]
2020-05-21 §
17:24	<elukey>	add druid100[7,8] to the druid public cluster (not serving load balancer traffic for the moment, only joining the cluster) - T252771	[analytics]
16:44	<elukey>	roll restart druid historical nodes on druid100[4-6] (public cluster) to pick up new settings - T252771	[analytics]