1401-1450 of 5953 results (26ms)
2022-06-13 §
12:26 <btullis> restarting hive-server2 and hive-metastore on an-coord1002 [analytics]
09:54 <joal> rerun failed refine for network_flows_internal [analytics]
09:54 <joal> Rerun failed refine for mediawiki_talk_page_edit events [analytics]
09:51 <joal> Manually rerun webrequest_text laod for hour 2022-06-13T03:00 [analytics]
07:18 <joal> Manually rerun webrequest_text laod for hour 2022-06-12T08:00 [analytics]
2022-06-10 §
17:00 <ottomata> applied change to airflow instances to bump scheduler parsing_processes = # of cpu processors [analytics]
08:58 <btullis> cookbook sre.hadoop.roll-restart-workers analytics [analytics]
2022-06-09 §
17:17 <joal> Rerun refine for failed datasets [analytics]
14:15 <btullis> manually failing back HDFS namenode from an-master1002 to an-master1001 [analytics]
13:15 <btullis> roll-restarting the hadoop masters to pick up new JRE [analytics]
2022-06-08 §
18:06 <joal> Restart airflow after deploy for dag reprocessing [analytics]
18:02 <joal> deploying Airflow dags [analytics]
13:45 <btullis> deploying refinery [analytics]
2022-06-07 §
13:45 <btullis> deploying updated eventgate images to all remaining deployments. [analytics]
11:33 <btullis> deployed an updated version of eventgate to eventgate-analytics-external to address the timing mis-calculation. [analytics]
10:51 <btullis> restart the eventlogging_to_druid_netflow-sanitization_daily service on an-launcher1002 [analytics]
2022-06-06 §
13:45 <btullis> restarting archiva service for new JRE [analytics]
06:31 <elukey> restart memcached on an-tool1005 to pick up puppet settings and clear an alert in icinga [analytics]
2022-06-05 §
03:14 <milimetric> rerunning mw history since the last failure just looked like a fluke [analytics]
2022-06-04 §
11:41 <joal> Maunally launch refinery-sqoop-mediawiki-production after manual fix of refinery-sqoop-mediawiki [analytics]
11:39 <joal> Manually sqoop enwiki:user and commonswiki:user and add _SUCCESS flag for following job to kick off [analytics]
2022-06-02 §
15:50 <mforns> deployed wikistats 2.9.5 [analytics]
14:02 <joal> Start browser_general_daily on airflow [analytics]
13:19 <joal> Drop and recreate wmf_raw.mediawiki_page table (field removal) [analytics]
12:44 <joal> Remove wrongly formatted interlanguage data [analytics]
12:36 <joal> Kill interlanguage-daily oozie job after successfull move to airflow [analytics]
12:15 <joal> Deploy interlanguage fix to airflow [analytics]
09:56 <joal> Relaunch sqoop after having deployed a corrective patch [analytics]
09:46 <joal> Manually mark interlaguage historical tasks failed in airflow [analytics]
08:54 <joal> Deploy airflow with spark3 jobs [analytics]
08:47 <joal> Merging 2 airflow spark3 jobs now that their refinery counterpart is dpeloyed [analytics]
08:07 <joal> Deploy refinery onto HDFS [analytics]
07:26 <joal> Deploy refinery using scap [analytics]
2022-06-01 §
21:04 <milimetric> trying to rerun sqoop from a screen on an-launcher [analytics]
20:09 <SandraEbele> Successfully deployed refinery using scap, then deployed onto hdfs. [analytics]
18:51 <SandraEbele> About to deploy analytics/refinery (regular weekly train) [analytics]
08:39 <elukey> powercycle an-worker1094 - OEM event registered in `racadm getsel`, host frozen [analytics]
2022-05-31 §
18:48 <ottomata> sudo -u hdfs hdfs dfsadmin -safemode leave on an-master1001 [analytics]
18:12 <ottomata> sudo service hadoop-hdfs-namenode start on an-master1002 [analytics]
18:10 <ottomata> sudo -u hdfs hdfs dfsadmin -safemode enter [analytics]
17:47 <btullis> starting namenode services on am-master1001 [analytics]
17:44 <btullis> restarting the datanodes on all five of the affected hadoop workers. [analytics]
17:43 <btullis> restarting journalnode service on each of the five hadoop workers with journals. [analytics]
17:41 <btullis> resizing each journalnode with resize2fs [analytics]
17:38 <btullis> sudo lvresize -L+20G analytics1069-vg/journalnode [analytics]
17:38 <btullis> increasing each of the hadoop journalnodes by 20 GB [analytics]
17:33 <ottomata> stop journalnodes and datanodes on 5 hadoop journalnode hosts [analytics]
17:30 <btullis> stopped the hdfs-namenode service on an-master100[1-2] [analytics]
15:36 <milimetric> dropped razzi databases and deleted HDFS directories (in trash) [analytics]
06:26 <elukey> `elukey@an-master1001:~$ sudo systemctl reset-failed hadoop-clean-fairscheduler-event-logs.service` [analytics]