analytics SAL

351-400 of 3651 results (17ms)

2021-01-09 §
08:31	<elukey>	restart the failed hdfs rsync timers on labstore100[6,7] to kick off the remaining jobs	[analytics]
08:30	<elukey>	execute hdfs chmod o+x of /wmf/data/archive/projectview /wmf/data/archive/projectview/legacy /wmf/data/archive/pageview/legacy to unblock hdfs rsyncs	[analytics]
08:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/pageview" to unblock labstore hdfs rsyncs	[analytics]
08:21	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/geoeditors" to unblock labstore hdfs rsync	[analytics]
2021-01-08 §
18:54	<joal>	Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily\|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain\|per_project_family]-[daily\|monthly])	[analytics]
18:14	<joal>	Restart projectview-hourly job (permissions test)	[analytics]
18:03	<joal>	Deploy refinery onto HDFS	[analytics]
17:50	<joal>	deploy refinery with scap	[analytics]
10:01	<elukey>	restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well	[analytics]
08:46	<elukey>	force restart of check_webrequest_partitions.service on an-launcher1002	[analytics]
08:44	<elukey>	force restart of monitor_refine_eventlogging_legacy_failure_flags.service	[analytics]
08:18	<elukey>	raise default max executor heap size for Spark refine to 4G	[analytics]
2021-01-07 §
18:22	<elukey>	chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables)	[analytics]
18:21	<elukey>	"sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats"	[analytics]
18:10	<elukey>	disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped	[analytics]
18:08	<elukey>	chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs	[analytics]
16:31	<elukey>	remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes	[analytics]
15:40	<elukey>	deprecate the 'reseachers' posix group for good	[analytics]
11:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well	[analytics]
10:36	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629	[analytics]
09:37	<elukey>	forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts	[analytics]
08:26	<joal>	Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20\|21, day=7/hour=0\|2)	[analytics]
08:14	<elukey>	re-enable puppet on an-launcher1002 to apply new refine memory settings	[analytics]
07:59	<elukey>	re-enabling all oozie jobs previously suspended	[analytics]
07:54	<elukey>	restart oozie on an-coord1001	[analytics]
2021-01-06 §
20:42	<ottomata>	starting remaining refine systemd timers	[analytics]
20:19	<ottomata>	restarted eventlogging_to_druid timers	[analytics]
20:19	<ottomata>	restarted drop systemd timers	[analytics]
20:18	<ottomata>	restarted reportupdater timers	[analytics]
20:14	<ottomata>	re-starting camus systemd timers	[analytics]
16:45	<razzi>	restart yarn nodemanagers	[analytics]
16:08	<razzi>	manually failover hdfs haadmin from an-master1002 to an-master1001	[analytics]
15:53	<ottomata>	stopping analytics systemd timers on an-launcher1002	[analytics]
2021-01-05 §
21:32	<ottomata>	bumped mediawiki history snapshot version in AQS	[analytics]
20:45	<ottomata>	Refine changes: event tables now have is_wmf_domain, canary events are removed, and corrupt records will result in a better monitoring email	[analytics]
20:43	<razzi>	deploy aqs as part of train	[analytics]
19:17	<razzi>	deploying refinery for weekly train	[analytics]
09:29	<joal>	Manually reload unique-devices monthly in cassandra to fix T271170	[analytics]
2021-01-04 §
22:22	<razzi>	reboot an-test-coord1001 to upgrade kernel	[analytics]
14:24	<elukey>	deprecate the analytics-users group	[analytics]
2021-01-03 §
14:11	<milimetric>	reset-failed refinery-sqoop-whole-mediawiki.service	[analytics]
14:10	<milimetric>	manual sqoop finished, logs on an-launcher1002 at /var/log/refinery/sqoop-mediawiki.log and /var/log/refinery/sqoop-mediawiki-production.log	[analytics]
2021-01-01 §
14:54	<milimetric>	deployed refinery hotfix for sqoop problem, after testing on three small wikis	[analytics]
2020-12-29 §
09:18	<elukey>	restart hue to pick up analytics-hive endpoint settings	[analytics]
2020-12-23 §
15:53	<ottomata>	point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768	[analytics]
2020-12-22 §
19:35	<elukey>	restart hive daemons on an-coord1001 to pick up new settings	[analytics]
18:13	<elukey>	failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001)	[analytics]
18:07	<elukey>	restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive)	[analytics]
17:00	<mforns>	Deployed refinery as part of weekly train (v0.0.142)	[analytics]
16:42	<mforns>	Deployed refinery-source v0.0.142	[analytics]