151-200 of 5483 results (18ms)
2023-07-04 §
09:29 <btullis> restarting the yarn restart with `sudo cumin -b 5 -p 80 -s 30 A:hadoop-worker 'systemctl restart hadoop-yarn-nodemanager'` [analytics]
08:57 <btullis> executing `cookbook sre.hadoop.roll-restart-workers analytics` [analytics]
2023-07-03 §
12:52 <btullis> restarting the aqs service to pick up mediawiki history snapshot for June [analytics]
2023-06-29 §
13:44 <btullis> upgrading airflow on an-launcher1002 to version 2.6.1 [analytics]
2023-06-28 §
13:25 <btullis> upgrading an-test-worker1003 to bullseye, after upgrading firmware [analytics]
13:08 <btullis> upgrading idrac firmware of an-test-worker1003 via the cookbook for T329363 [analytics]
2023-06-27 §
14:53 <mforns> deployed airflow analytics to unbreak DataHub's Druid ingestion [analytics]
13:32 <joal> Rerun druid_load_pageviews_hourly_aggregated_daily after deploy [analytics]
13:32 <joal> druid_load_pageviews_hourly_aggregated_dailyRerun [analytics]
13:25 <joal> Deploy Airflow [analytics]
11:10 <joal> Deploy refinery onto HDFS [analytics]
11:01 <stevemunene> upgrading an-test-worker1003 to bullseye, keeping `/srv/hadoop` intact [analytics]
10:55 <joal> Deploy refinery using scap [analytics]
09:42 <stevemunene> !log run puppet on hadoop-masters this does a refresh of the hdfs nodes [analytics]
09:38 <stevemunene> Exclude analytics1061_1069 from HDFS and YARN [analytics]
09:21 <btullis> upgrading an-test-worker1002 to bullseye, keeping `/srv/hadoop` intact [analytics]
08:38 <elukey> revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private [analytics]
07:14 <elukey> `sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet [analytics]
2023-06-26 §
23:22 <btullis> shutting down an-worker1092 in preparation for RAID controller battery replacement [analytics]
14:06 <elukey> move varnishkafka instances in esams to pki - T337825 [analytics]
11:39 <stevemunene> running hdfs dfsadmin -refreshNodes to pick up analytics106[1-3] from hosts.exclude [analytics]
11:35 <stevemunene> disable puppet on an-master1001.eqiad.wmnet [analytics]
09:40 <joal> Rerun failed druid-loading airflow jobs [analytics]
09:38 <btullis> deploying presto version 0.281 to production [analytics]
06:28 <stevemunene> run puppet on hadoop-masters [analytics]
06:27 <stevemunene> Excluding analytics106[4-6] from HDFS and YARN as we Decommission them [analytics]
2023-06-23 §
12:40 <elukey> move varnishkafka drmrs instances to pki - T337825 [analytics]
10:20 <btullis> reboot an-worker1110 after initializing a second replacement drive for T336929 [analytics]
10:16 <elukey> restart turnilo to pick up config changes - T340097 [analytics]
2023-06-22 §
15:57 <btullis> adding new bigtop-1.5 packages to apt.wikimedia.org for bullseye [analytics]
15:50 <elukey> update the webrequest_sampled_live druid kafka supervisor to add the https field - T340097 [analytics]
15:18 <btullis> cleared status for aqs_hourly.wait_for_webrequest run 13:00 and the downstream task on an-test-client1001. [analytics]
15:07 <btullis> clearing task for refine_webrequest_hourly_test_text hour 13:00 [analytics]
14:36 <btullis> restarted airflow-webserver and airflow-scheduler on an-test-client1001 with version 2.6.1. [analytics]
14:11 <btullis> redeploying datahub to staging to try to get upgrade to 0.10.0 working. [analytics]
14:02 <stevemunene> running sre.hadoop.roll-restart-masters restart the Namenodes to completely remove any reference of analytics106[1-3] T317861 [analytics]
13:47 <stevemunene> run puppet on hadoop-masters [analytics]
13:43 <stevemunene> Remove analytics106[1-3] from the HDFS topology [analytics]
13:16 <elukey> move varnishafka instances in eqiad to PKI - T337825 [analytics]
13:14 <btullis> deploying the new eventgate-wikimedia container to eventgate-main [analytics]
08:57 <btullis> cleared airflow task for `projectview_geo.move_data_to_archive` [analytics]
2023-06-21 §
16:46 <joal> Rerun cassandra-load tasks for pageview-per-project daily and hourly for 2023-06-20 hour 4 [analytics]
16:46 <joal> rerun browser_general_daily for 2023-06-20 [analytics]
16:40 <joal> Rerun projectview-hourly DAG for hour: 2023-06-20T04:00 [analytics]
15:44 <mforns> deployed airflow analytics to remove deprecated dag for mobile_apps [analytics]
12:51 <elukey> move varnishafka instances in codfw to PKI - T337825 [analytics]
2023-06-20 §
21:28 <aqu> Manual edit of `/srv/airflow-analytics/connections.yml` following changes in https://gerrit.wikimedia.org/r/c/operations/puppet/+/931690 to avoid alerts Airflow analytics aqs_hourly [analytics]
20:59 <aqu> Manually marked as success `wikidata_dump_to_hive_weekly` iteration `2023-02-13` in Airflow analytics [analytics]
19:55 <btullis> clearing the first failed emit_lineage_to_datahub_for_hive_wmf_aqs_hourly task https://usercontent.irccloud-cdn.com/file/vW6YdEof/image.png [analytics]
19:51 <btullis> merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/931683 to fix the aqs_hourly datahub lineage failure [analytics]