analytics SAL

1-50 of 3306 results (18ms)

2021-01-20 §
21:48	<milimetric>	refinery deployed, synced to hdfs, ready to restart 53 oozie jobs, will do so slowly over the next few hours	[analytics]
18:11	<joal>	Release refinery-source v0.0.144 to archiva with Jenkins	[analytics]
2021-01-15 §
09:21	<elukey>	roll restart druid brokers on druid public - stuck after datasource drop	[analytics]
2021-01-11 §
07:26	<elukey>	execute 'sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/mediawiki' on launcher to fix dir perms	[analytics]
2021-01-09 §
15:11	<elukey>	restart timers 'analytics-*' on labstore100[6,7] to apply new permission settings	[analytics]
08:31	<elukey>	restart the failed hdfs rsync timers on labstore100[6,7] to kick off the remaining jobs	[analytics]
08:30	<elukey>	execute hdfs chmod o+x of /wmf/data/archive/projectview /wmf/data/archive/projectview/legacy /wmf/data/archive/pageview/legacy to unblock hdfs rsyncs	[analytics]
08:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/pageview" to unblock labstore hdfs rsyncs	[analytics]
08:21	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/geoeditors" to unblock labstore hdfs rsync	[analytics]
2021-01-08 §
18:54	<joal>	Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily\|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain\|per_project_family]-[daily\|monthly])	[analytics]
18:14	<joal>	Restart projectview-hourly job (permissions test)	[analytics]
18:03	<joal>	Deploy refinery onto HDFS	[analytics]
17:50	<joal>	deploy refinery with scap	[analytics]
10:01	<elukey>	restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well	[analytics]
08:46	<elukey>	force restart of check_webrequest_partitions.service on an-launcher1002	[analytics]
08:44	<elukey>	force restart of monitor_refine_eventlogging_legacy_failure_flags.service	[analytics]
08:18	<elukey>	raise default max executor heap size for Spark refine to 4G	[analytics]
2021-01-07 §
18:22	<elukey>	chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables)	[analytics]
18:21	<elukey>	"sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats"	[analytics]
18:10	<elukey>	disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped	[analytics]
18:08	<elukey>	chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs	[analytics]
16:31	<elukey>	remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes	[analytics]
15:40	<elukey>	deprecate the 'reseachers' posix group for good	[analytics]
11:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well	[analytics]
10:36	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629	[analytics]
09:37	<elukey>	forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts	[analytics]
08:26	<joal>	Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20\|21, day=7/hour=0\|2)	[analytics]
08:14	<elukey>	re-enable puppet on an-launcher1002 to apply new refine memory settings	[analytics]
07:59	<elukey>	re-enabling all oozie jobs previously suspended	[analytics]
07:54	<elukey>	restart oozie on an-coord1001	[analytics]
2021-01-06 §
20:42	<ottomata>	starting remaining refine systemd timers	[analytics]
20:19	<ottomata>	restarted eventlogging_to_druid timers	[analytics]
20:19	<ottomata>	restarted drop systemd timers	[analytics]
20:18	<ottomata>	restarted reportupdater timers	[analytics]
20:14	<ottomata>	re-starting camus systemd timers	[analytics]
16:45	<razzi>	restart yarn nodemanagers	[analytics]
16:08	<razzi>	manually failover hdfs haadmin from an-master1002 to an-master1001	[analytics]
15:53	<ottomata>	stopping analytics systemd timers on an-launcher1002	[analytics]
2021-01-05 §
21:32	<ottomata>	bumped mediawiki history snapshot version in AQS	[analytics]
20:45	<ottomata>	Refine changes: event tables now have is_wmf_domain, canary events are removed, and corrupt records will result in a better monitoring email	[analytics]
20:43	<razzi>	deploy aqs as part of train	[analytics]
19:17	<razzi>	deploying refinery for weekly train	[analytics]
09:29	<joal>	Manually reload unique-devices monthly in cassandra to fix T271170	[analytics]
2021-01-04 §
22:22	<razzi>	reboot an-test-coord1001 to upgrade kernel	[analytics]
14:24	<elukey>	deprecate the analytics-users group	[analytics]
2021-01-03 §
14:11	<milimetric>	reset-failed refinery-sqoop-whole-mediawiki.service	[analytics]
14:10	<milimetric>	manual sqoop finished, logs on an-launcher1002 at /var/log/refinery/sqoop-mediawiki.log and /var/log/refinery/sqoop-mediawiki-production.log	[analytics]
2021-01-01 §
14:54	<milimetric>	deployed refinery hotfix for sqoop problem, after testing on three small wikis	[analytics]
2020-12-29 §
09:18	<elukey>	restart hue to pick up analytics-hive endpoint settings	[analytics]
2020-12-23 §
15:53	<ottomata>	point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768	[analytics]