analytics SAL

501-550 of 3826 results (11ms)

2021-01-25 §
20:41	<razzi>	rebalance kafka partitions for codfw.mediawiki.page-properties-change	[analytics]
18:58	<razzi>	rebalance kafka partitions for eventlogging_ExternalGuidance	[analytics]
18:53	<razzi>	rebalance kafka partitions for eqiad.mediawiki.job.ChangeDeletionNotification	[analytics]
17:13	<joal>	Copy /user to backup cluster (92Tb) - T272846	[analytics]
16:22	<elukey>	drain+restart cassandra on aqs1004 to pick up the new openjdk (canary)	[analytics]
16:21	<elukey>	restart yarn and hdfs daemon on analytics1058 (canary node for new openjdk)	[analytics]
12:25	<joal>	Copy /wmf/data/archive to backup cluster (32Tb) - T272846	[analytics]
10:20	<elukey>	restart memcached on an-tool1010 to flush superset's cache	[analytics]
10:18	<elukey>	restart superset to remove druid datasources support - T263972	[analytics]
09:57	<joal>	Changing ownership of archive WMF files to analytics:analytics-privatedata-users after update of oozie jobs	[analytics]
2021-01-22 §
17:38	<mforns>	finished refinery deploy to HDFS	[analytics]
17:28	<mforns>	restarted refine_event and refine_eventlogging_legacy in an-launcher1002	[analytics]
17:11	<mforns>	starting refinery deploy using scap	[analytics]
17:09	<mforns>	bumped up refinery-source jar version to 0.0.145 in puppet for Refine and DruidLoad jobs	[analytics]
16:44	<mforns>	Deployed refinery-source v0.0.145 using jenkins	[analytics]
09:48	<joal>	Raise druid-public default replication-factor from 2 to 3	[analytics]
2021-01-21 §
18:54	<razzi>	rebooting nodes for druid public cluster via cookbook	[analytics]
16:49	<ottomata>	installed libsnappy-dev and python3-snappy on webperf1001	[analytics]
15:17	<joal>	Kill mediawiki-wikitext-history-wf-2020-12 as it was stuck and failed	[analytics]
11:19	<elukey>	block UA with 'python-requests.*' hitting AQS via Varnish	[analytics]
2021-01-20 §
21:48	<milimetric>	refinery deployed, synced to hdfs, ready to restart 53 oozie jobs, will do so slowly over the next few hours	[analytics]
18:11	<joal>	Release refinery-source v0.0.144 to archiva with Jenkins	[analytics]
2021-01-15 §
09:21	<elukey>	roll restart druid brokers on druid public - stuck after datasource drop	[analytics]
2021-01-11 §
07:26	<elukey>	execute 'sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/mediawiki' on launcher to fix dir perms	[analytics]
2021-01-09 §
15:11	<elukey>	restart timers 'analytics-*' on labstore100[6,7] to apply new permission settings	[analytics]
08:31	<elukey>	restart the failed hdfs rsync timers on labstore100[6,7] to kick off the remaining jobs	[analytics]
08:30	<elukey>	execute hdfs chmod o+x of /wmf/data/archive/projectview /wmf/data/archive/projectview/legacy /wmf/data/archive/pageview/legacy to unblock hdfs rsyncs	[analytics]
08:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/pageview" to unblock labstore hdfs rsyncs	[analytics]
08:21	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/geoeditors" to unblock labstore hdfs rsync	[analytics]
2021-01-08 §
18:54	<joal>	Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily\|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain\|per_project_family]-[daily\|monthly])	[analytics]
18:14	<joal>	Restart projectview-hourly job (permissions test)	[analytics]
18:03	<joal>	Deploy refinery onto HDFS	[analytics]
17:50	<joal>	deploy refinery with scap	[analytics]
10:01	<elukey>	restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well	[analytics]
08:46	<elukey>	force restart of check_webrequest_partitions.service on an-launcher1002	[analytics]
08:44	<elukey>	force restart of monitor_refine_eventlogging_legacy_failure_flags.service	[analytics]
08:18	<elukey>	raise default max executor heap size for Spark refine to 4G	[analytics]
2021-01-07 §
18:22	<elukey>	chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables)	[analytics]
18:21	<elukey>	"sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats"	[analytics]
18:10	<elukey>	disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped	[analytics]
18:08	<elukey>	chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs	[analytics]
16:31	<elukey>	remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes	[analytics]
15:40	<elukey>	deprecate the 'reseachers' posix group for good	[analytics]
11:24	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well	[analytics]
10:36	<elukey>	execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629	[analytics]
09:37	<elukey>	forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts	[analytics]
08:26	<joal>	Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20\|21, day=7/hour=0\|2)	[analytics]
08:14	<elukey>	re-enable puppet on an-launcher1002 to apply new refine memory settings	[analytics]
07:59	<elukey>	re-enabling all oozie jobs previously suspended	[analytics]
07:54	<elukey>	restart oozie on an-coord1001	[analytics]