analytics SAL

501-550 of 3111 results (18ms)

2020-04-29 §
18:36	<joal>	kill restart pageview-druid jobs (hourly, daily, monthly) to add new dimension	[analytics]
18:29	<joal>	Kill-restart data-quality-stats-hourly bundle	[analytics]
17:57	<joal>	Deploy refinery on HDFS	[analytics]
17:45	<elukey>	roll restart Presto workers to pick up the new jvm settings (110G heap size)	[analytics]
16:06	<joal>	Deploying refinery using scap	[analytics]
15:57	<joal>	Deploying AQS using scap	[analytics]
14:26	<elukey>	enable TLS consumer/producers for kafka main -> jumbo mirror maker - T250250	[analytics]
13:48	<joal>	Releasing refinery 0.0.123 onto archiva with Jenkins	[analytics]
08:47	<elukey>	roll restart zookeeper on an-conf* to pick up new openjdk11 updates (affects hadoop)	[analytics]
2020-04-27 §
13:02	<elukey>	superset 0.36.0 deployed to an-tool1005	[analytics]
2020-04-26 §
18:14	<elukey>	restart nodemanager on analytics1054 - failed due to heap pressure	[analytics]
18:14	<elukey>	re-run webrequest-load-coord-text 26/04/2020T16 via Hue	[analytics]
2020-04-23 §
13:57	<elukey>	launch again data quality stats bundle with https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/592008/ applied locally	[analytics]
2020-04-22 §
06:46	<elukey>	kill dataquality hourly bundle again, traffic_by_country keeps failing	[analytics]
06:11	<elukey>	start data quality bundle hourly with --user=analytics	[analytics]
05:45	<elukey>	add a separate refinery scap target for the Hadoop test cluster and redeploy to check new settings	[analytics]
2020-04-21 §
23:17	<milimetric>	restarted webrequest bundle, babysitting that first before going on	[analytics]
23:00	<milimetric>	forgot a small jar version update, finished deploying now	[analytics]
21:38	<milimetric>	deployed twice because analytics1030 failed with "OSError {}" but seems ok after the second deploy	[analytics]
14:27	<elukey>	add motd to notebook100[3,4] to alert about host deprecation (in favor of stat100x)	[analytics]
11:51	<elukey>	manually add SUCCESS flags under /wmf/data/wmf/banner_activity/daily/year=2020/month=1 and /wmf/data/wmf/banner_activity/daily/year=2019/month=12 to unblock druid banner monthly indexations	[analytics]
2020-04-20 §
14:38	<ottomata>	restarting eventlogging-processor with updated python3-ua-parser for parsing KaiOS user ageints	[analytics]
10:28	<elukey>	drop /srv/log/mw-log/archive/api from stat1007 (freeing 1.3TB of space!)	[analytics]
2020-04-18 §
21:40	<elukey>	force hdfs-balancer as attempt to redistribute hdfs blocks more evenly to worker nodes (hoping to free the busiest ones)	[analytics]
21:32	<elukey>	drop /user/analytics-privatedata/.Trash/* from hdfs to free some space (~100G used)	[analytics]
21:25	<elukey>	drop /var/log/hadoop-yarn/apps/analytics-search/* from hdfs to free space (~8T replicated used)	[analytics]
21:21	<elukey>	drop /user/{analytics\|hdfs}/.Trash/* from hdfs to free space (~100T used)	[analytics]
21:12	<elukey>	drop /var/log/hadoop-yarn/apps/analytics from hdfs to free space (15.1T replicated)	[analytics]
2020-04-17 §
13:45	<elukey>	lock down /srv/log/mw-log/archive/ on stat1007 to analytics-privatedata-users access only	[analytics]
10:26	<elukey>	re-created default venv for notebooks on notebook100[3,4] (missed to git pull before re-creaing it the last time)	[analytics]
2020-04-16 §
05:34	<elukey>	restart hadoop-yarn-nodemanager on an-worker108[4,5] - failed after GC OOM events (heavy spark jobs)	[analytics]
2020-04-15 §
14:03	<elukey>	update Superset Alpha role perms with what stated in T249923#6058862	[analytics]
09:35	<elukey>	restart jupyterhub too as follow up	[analytics]
09:35	<elukey>	execute "create_virtualenv.sh ../venv" on stat1006, notebook1003, notebook1004 to apply new settings to Spark kernels (re-creating them)	[analytics]
09:09	<elukey>	restart druid brokers on druid100[4-6] - stuck after datasource deletion	[analytics]
2020-04-11 §
09:19	<elukey>	set hive-security: read-only for the Presto hive connector and roll restart the cluster	[analytics]
2020-04-10 §
16:31	<elukey>	enable TLS from kafkatee to Kafka on analytics1030 (test instance)	[analytics]
15:45	<elukey>	migrate data_purge timers from an-coord1001 to an-launcher1001	[analytics]
09:11	<elukey>	move druid_load jobs from an-coord1001 to an-launcher1001	[analytics]
08:08	<elukey>	move project_namespace_map from an-coord1001 to an-launcher1001	[analytics]
07:38	<elukey>	move hdfs-cleaner from an-coord1001 to an-launcher1001	[analytics]
2020-04-09 §
20:54	<elukey>	re-run webrequest upload/text hour 15:00 from Hue (stuck due to missing _IMPORTED flag, caused by an-launcher1001 migration. Andrew fixed it re-running manually the Camus checker)	[analytics]
16:00	<elukey>	move camus timers from an-coord1001 to an-launcher1001	[analytics]
15:20	<elukey>	absent spark refine timers on an-coord1001 and move them to an-launcher1001	[analytics]
2020-04-07 §
09:17	<elukey>	enable refine for TwoColConflictExit (EL schema)	[analytics]
2020-04-06 §
13:23	<elukey>	upgraded stat1008 to AMD ROCm 3.3 (enables tensorflow 2.x)	[analytics]
12:33	<joal>	Bump AQS druid backend to 2020-03	[analytics]
11:50	<elukey>	deploy new druid datasource in Druid public	[analytics]
06:29	<elukey>	allow all analytics-privatedata-users to use the GPUs on stat1005/8	[analytics]
2020-04-04 §
06:52	<elukey>	restart refinery-import-page-history-dumps	[analytics]