501-550 of 3826 results (15ms)
2021-01-25 §
20:41 <razzi> rebalance kafka partitions for codfw.mediawiki.page-properties-change [analytics]
18:58 <razzi> rebalance kafka partitions for eventlogging_ExternalGuidance [analytics]
18:53 <razzi> rebalance kafka partitions for eqiad.mediawiki.job.ChangeDeletionNotification [analytics]
17:13 <joal> Copy /user to backup cluster (92Tb) - T272846 [analytics]
16:22 <elukey> drain+restart cassandra on aqs1004 to pick up the new openjdk (canary) [analytics]
16:21 <elukey> restart yarn and hdfs daemon on analytics1058 (canary node for new openjdk) [analytics]
12:25 <joal> Copy /wmf/data/archive to backup cluster (32Tb) - T272846 [analytics]
10:20 <elukey> restart memcached on an-tool1010 to flush superset's cache [analytics]
10:18 <elukey> restart superset to remove druid datasources support - T263972 [analytics]
09:57 <joal> Changing ownership of archive WMF files to analytics:analytics-privatedata-users after update of oozie jobs [analytics]
2021-01-22 §
17:38 <mforns> finished refinery deploy to HDFS [analytics]
17:28 <mforns> restarted refine_event and refine_eventlogging_legacy in an-launcher1002 [analytics]
17:11 <mforns> starting refinery deploy using scap [analytics]
17:09 <mforns> bumped up refinery-source jar version to 0.0.145 in puppet for Refine and DruidLoad jobs [analytics]
16:44 <mforns> Deployed refinery-source v0.0.145 using jenkins [analytics]
09:48 <joal> Raise druid-public default replication-factor from 2 to 3 [analytics]
2021-01-21 §
18:54 <razzi> rebooting nodes for druid public cluster via cookbook [analytics]
16:49 <ottomata> installed libsnappy-dev and python3-snappy on webperf1001 [analytics]
15:17 <joal> Kill mediawiki-wikitext-history-wf-2020-12 as it was stuck and failed [analytics]
11:19 <elukey> block UA with 'python-requests.*' hitting AQS via Varnish [analytics]
2021-01-20 §
21:48 <milimetric> refinery deployed, synced to hdfs, ready to restart 53 oozie jobs, will do so slowly over the next few hours [analytics]
18:11 <joal> Release refinery-source v0.0.144 to archiva with Jenkins [analytics]
2021-01-15 §
09:21 <elukey> roll restart druid brokers on druid public - stuck after datasource drop [analytics]
2021-01-11 §
07:26 <elukey> execute 'sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/mediawiki' on launcher to fix dir perms [analytics]
2021-01-09 §
15:11 <elukey> restart timers 'analytics-*' on labstore100[6,7] to apply new permission settings [analytics]
08:31 <elukey> restart the failed hdfs rsync timers on labstore100[6,7] to kick off the remaining jobs [analytics]
08:30 <elukey> execute hdfs chmod o+x of /wmf/data/archive/projectview /wmf/data/archive/projectview/legacy /wmf/data/archive/pageview/legacy to unblock hdfs rsyncs [analytics]
08:24 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/pageview" to unblock labstore hdfs rsyncs [analytics]
08:21 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod o+rx /wmf/data/archive/geoeditors" to unblock labstore hdfs rsync [analytics]
2021-01-08 §
18:54 <joal> Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain|per_project_family]-[daily|monthly]) [analytics]
18:14 <joal> Restart projectview-hourly job (permissions test) [analytics]
18:03 <joal> Deploy refinery onto HDFS [analytics]
17:50 <joal> deploy refinery with scap [analytics]
10:01 <elukey> restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well [analytics]
08:46 <elukey> force restart of check_webrequest_partitions.service on an-launcher1002 [analytics]
08:44 <elukey> force restart of monitor_refine_eventlogging_legacy_failure_flags.service [analytics]
08:18 <elukey> raise default max executor heap size for Spark refine to 4G [analytics]
2021-01-07 §
18:22 <elukey> chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables) [analytics]
18:21 <elukey> "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats" [analytics]
18:10 <elukey> disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped [analytics]
18:08 <elukey> chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs [analytics]
16:31 <elukey> remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes [analytics]
15:40 <elukey> deprecate the 'reseachers' posix group for good [analytics]
11:24 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well [analytics]
10:36 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 [analytics]
09:37 <elukey> forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts [analytics]
08:26 <joal> Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) [analytics]
08:14 <elukey> re-enable puppet on an-launcher1002 to apply new refine memory settings [analytics]
07:59 <elukey> re-enabling all oozie jobs previously suspended [analytics]
07:54 <elukey> restart oozie on an-coord1001 [analytics]