| 2021-01-25
      
      § | 
    
  | 20:41 | <razzi> | rebalance kafka partitions for codfw.mediawiki.page-properties-change | [analytics] | 
            
  | 18:58 | <razzi> | rebalance kafka partitions for eventlogging_ExternalGuidance | [analytics] | 
            
  | 18:53 | <razzi> | rebalance kafka partitions for eqiad.mediawiki.job.ChangeDeletionNotification | [analytics] | 
            
  | 17:13 | <joal> | Copy /user to backup cluster (92Tb) - T272846 | [analytics] | 
            
  | 16:22 | <elukey> | drain+restart cassandra on aqs1004 to pick up the new openjdk (canary) | [analytics] | 
            
  | 16:21 | <elukey> | restart yarn and hdfs daemon on analytics1058 (canary node for new openjdk) | [analytics] | 
            
  | 12:25 | <joal> | Copy /wmf/data/archive to backup cluster (32Tb) - T272846 | [analytics] | 
            
  | 10:20 | <elukey> | restart memcached on an-tool1010 to flush superset's cache | [analytics] | 
            
  | 10:18 | <elukey> | restart superset to remove druid datasources support - T263972 | [analytics] | 
            
  | 09:57 | <joal> | Changing ownership of archive WMF files to analytics:analytics-privatedata-users after update of oozie jobs | [analytics] | 
            
  
    | 2021-01-08
      
      § | 
    
  | 18:54 | <joal> | Restart jobs for permissions-fix (clickstream, mediacounts-archive, geoeditors-public_monthly, geoeditors-yearly, mobile_app-uniques-[daily|monthly], pageview-daily_dump, pageview-hourly, projectview-geo, unique_devices-[per_domain|per_project_family]-[daily|monthly]) | [analytics] | 
            
  | 18:14 | <joal> | Restart projectview-hourly job (permissions test) | [analytics] | 
            
  | 18:03 | <joal> | Deploy refinery onto HDFS | [analytics] | 
            
  | 17:50 | <joal> | deploy refinery with scap | [analytics] | 
            
  | 10:01 | <elukey> | restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well | [analytics] | 
            
  | 08:46 | <elukey> | force restart of check_webrequest_partitions.service on an-launcher1002 | [analytics] | 
            
  | 08:44 | <elukey> | force restart of monitor_refine_eventlogging_legacy_failure_flags.service | [analytics] | 
            
  | 08:18 | <elukey> | raise default max executor heap size for Spark refine to 4G | [analytics] | 
            
  
    | 2021-01-07
      
      § | 
    
  | 18:22 | <elukey> | chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables) | [analytics] | 
            
  | 18:21 | <elukey> | "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats" | [analytics] | 
            
  | 18:10 | <elukey> | disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped | [analytics] | 
            
  | 18:08 | <elukey> | chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs | [analytics] | 
            
  | 16:31 | <elukey> | remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes | [analytics] | 
            
  | 15:40 | <elukey> | deprecate the 'reseachers' posix group for good | [analytics] | 
            
  | 11:24 | <elukey> | execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well | [analytics] | 
            
  | 10:36 | <elukey> | execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 | [analytics] | 
            
  | 09:37 | <elukey> | forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts | [analytics] | 
            
  | 08:26 | <joal> | Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) | [analytics] | 
            
  | 08:14 | <elukey> | re-enable puppet on an-launcher1002 to apply new refine memory settings | [analytics] | 
            
  | 07:59 | <elukey> | re-enabling all oozie jobs previously suspended | [analytics] | 
            
  | 07:54 | <elukey> | restart oozie on an-coord1001 | [analytics] |