| 
      
        2022-09-14
      
      §
     | 
  
    
  | 17:11 | 
  <aqu> | 
  Sep 14 15:23:34 UTC sudo  systemctl start check_webrequest_partitions.service | 
  [analytics] | 
            
  | 12:56 | 
  <aqu> | 
  ~1hago sudo systemctl start refinery-sqoop-mediawiki-production-daily.service ; sudo systemctl start refinery-import-siteinfo-dumps.service ; sudo systemctl start refinery-import-page-current-dumps.service ; sudo systemctl start refinery-import-page-history-dumps.service | 
  [analytics] | 
            
  | 11:34 | 
  <btullis> | 
  remounted all remaining /mnt/hdfs mount points, except stat1005 which is busy | 
  [analytics] | 
            
  | 11:12 | 
  <btullis> | 
  remounted /mnt/hdfs on an-coord100[1-2] | 
  [analytics] | 
            
  | 11:09 | 
  <btullis> | 
  remounted /mnt/hdfs on an-airflow1001 | 
  [analytics] | 
            
  | 09:14 | 
  <joal> | 
  Restart oozie virtualpageview job | 
  [analytics] | 
            
  | 09:10 | 
  <btullis> | 
  re-mounted /mnt/hdfs on an-launcher1002. | 
  [analytics] | 
            
  | 07:11 | 
  <joal> | 
  restart webrequest oozie bundle | 
  [analytics] | 
            
  
    | 
      
        2022-09-13
      
      §
     | 
  
    
  | 17:22 | 
  <joal> | 
  rerun refine_eventloggin_legacy | 
  [analytics] | 
            
  | 17:14 | 
  <joal> | 
  rerun refine_event | 
  [analytics] | 
            
  | 17:14 | 
  <joal> | 
  rerun refine_netflow | 
  [analytics] | 
            
  | 16:53 | 
  <joal> | 
  Rerun refine_eventlogging_analytics | 
  [analytics] | 
            
  | 16:45 | 
  <joal> | 
  Kill-rerun suspended oozie jobs (virtual-pagview and predictions-actor | 
  [analytics] | 
            
  | 16:34 | 
  <joal> | 
  rerun failed webrequest oozie jobs | 
  [analytics] | 
            
  | 16:30 | 
  <btullis> | 
  restarting hive-server2 and hive-metastore on an-coord1001 (currently standby) | 
  [analytics] | 
            
  | 16:29 | 
  <btullis> | 
  restarting oozie on an-coord1001 | 
  [analytics] | 
            
  | 16:10 | 
  <joal> | 
  Rerun failed oozie webrequest jobs | 
  [analytics] | 
            
  | 15:57 | 
  <btullis> | 
  rolling out updated hadoop packages to an-airflow1003 | 
  [analytics] | 
            
  | 15:55 | 
  <btullis> | 
  rolling out upgraded hadoop client packages to stat servers. | 
  [analytics] | 
            
  | 15:51 | 
  <btullis> | 
  restarting eventlogging_to_druid_network_flows_internal_hourly.service eventlogging_to_druid_prefupdate_hourly.service refine_event_sanitized_analytics_immediate.service refine_event_sanitized_main_immediate.service | 
  [analytics] | 
            
  | 15:49 | 
  <btullis> | 
  restarting eventlogging_to_druid_navigationtiming_hourly.service on an-launcher1002 | 
  [analytics] | 
            
  | 15:46 | 
  <btullis> | 
  restarting eventlogging_to_druid_editattemptstep_hourly.service on an-launcher1002 | 
  [analytics] | 
            
  | 15:44 | 
  <btullis> | 
  cancel that last message. Upgrading hadoop packages on an-launcher instead. They were inadvertently omitted last time. | 
  [analytics] | 
            
  | 15:39 | 
  <btullis> | 
  Going to downgrade hadoop on ann hadoop-worker nodes to 2.10.1 | 
  [analytics] | 
            
  | 15:21 | 
  <btullis> | 
  failed over hive to an-coord1002 via DNS https://gerrit.wikimedia.org/r/c/operations/dns/+/831906 | 
  [analytics] | 
            
  | 15:20 | 
  <btullis> | 
  restarted yarn service on an-master1002 to make the active host an-master1001 again. | 
  [analytics] | 
            
  | 15:11 | 
  <btullis> | 
  restart hive-server2 and hive-metastore service on an-coord1002 to pick up new version of hadoop | 
  [analytics] |