2021-01-25
§
|
20:42 |
<razzi> |
rebalance kafka partitions for eqiad.mediawiki.page-properties-change.json |
[analytics] |
20:41 |
<razzi> |
rebalance kafka partitions for codfw.mediawiki.page-properties-change |
[analytics] |
18:58 |
<razzi> |
rebalance kafka partitions for eventlogging_ExternalGuidance |
[analytics] |
18:53 |
<razzi> |
rebalance kafka partitions for eqiad.mediawiki.job.ChangeDeletionNotification |
[analytics] |
17:13 |
<joal> |
Copy /user to backup cluster (92Tb) - T272846 |
[analytics] |
16:22 |
<elukey> |
drain+restart cassandra on aqs1004 to pick up the new openjdk (canary) |
[analytics] |
16:21 |
<elukey> |
restart yarn and hdfs daemon on analytics1058 (canary node for new openjdk) |
[analytics] |
12:25 |
<joal> |
Copy /wmf/data/archive to backup cluster (32Tb) - T272846 |
[analytics] |
10:20 |
<elukey> |
restart memcached on an-tool1010 to flush superset's cache |
[analytics] |
10:18 |
<elukey> |
restart superset to remove druid datasources support - T263972 |
[analytics] |
09:57 |
<joal> |
Changing ownership of archive WMF files to analytics:analytics-privatedata-users after update of oozie jobs |
[analytics] |
2021-01-08
§
|
18:54 |
<joal> |
Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain|per_project_family]-[daily|monthly]) |
[analytics] |
18:14 |
<joal> |
Restart projectview-hourly job (permissions test) |
[analytics] |
18:03 |
<joal> |
Deploy refinery onto HDFS |
[analytics] |
17:50 |
<joal> |
deploy refinery with scap |
[analytics] |
10:01 |
<elukey> |
restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well |
[analytics] |
08:46 |
<elukey> |
force restart of check_webrequest_partitions.service on an-launcher1002 |
[analytics] |
08:44 |
<elukey> |
force restart of monitor_refine_eventlogging_legacy_failure_flags.service |
[analytics] |
08:18 |
<elukey> |
raise default max executor heap size for Spark refine to 4G |
[analytics] |
2021-01-07
§
|
18:22 |
<elukey> |
chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables) |
[analytics] |
18:21 |
<elukey> |
"sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats" |
[analytics] |
18:10 |
<elukey> |
disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped |
[analytics] |
18:08 |
<elukey> |
chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs |
[analytics] |
16:31 |
<elukey> |
remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes |
[analytics] |
15:40 |
<elukey> |
deprecate the 'reseachers' posix group for good |
[analytics] |
11:24 |
<elukey> |
execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well |
[analytics] |
10:36 |
<elukey> |
execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 |
[analytics] |
09:37 |
<elukey> |
forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts |
[analytics] |
08:26 |
<joal> |
Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) |
[analytics] |