851-900 of 5939 results (30ms)
2023-04-11 §
13:46 <elukey> powercycle analytics1069, down for some days now, host stuck from the mgmt/serial console [analytics]
08:14 <aqu> About to deploy analytics/refinery (To migrate webrequest load from Oozie to Airflow) [analytics]
2023-04-10 §
19:20 <mforns> deployed airflow analytics to fix mediawiki wikitext history [analytics]
2023-04-07 §
10:34 <aqu> About to deploy analytics/refinery in test cluster [analytics]
2023-04-05 §
20:17 <mforns> deployed airflow to fix aqs pageview ranks [analytics]
20:08 <mforns> finished second refinery deployment to fix aqs rankings [analytics]
19:54 <mforns> starting second refinery deployment to fix aqs rankings [analytics]
19:35 <mforns> finished refinery deployment to fix aqs rankings\ [analytics]
19:18 <mforns> starting refinery deployment to fix aqs rankings [analytics]
16:24 <elukey> kafka test cluster migrated to bullseye [analytics]
14:00 <elukey> powercycle an-worker1132 [analytics]
2023-04-04 §
13:39 <steve_munene> leave hdfs safemode T331882 [analytics]
12:57 <steve_munene> putting hdfs into safe mode as part of T331882 [analytics]
11:42 <elukey> stop puppet on an-launcher1002 and manually stop .timer units [analytics]
07:34 <aqu> Rerun refine_event with "sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_event --ignore_failure_flag=true --table_include_regex='mediawiki_visual_editor_feature_use|mediawiki_edit_attempt|mediawiki_web_ui_interactions' --since='2023-04-02T18:00:00.000Z' --until='2023-04-03T19:00:00.000Z'" [analytics]
2023-04-03 §
08:01 <elukey> fix old envoyproxy monitor for an-test-ui1001 [analytics]
2023-03-31 §
12:23 <btullis> deploying datahub to staging T333580 [analytics]
08:44 <btullis> Shutting down an-worker1091 for RAID battery replacement T332883 [analytics]
2023-03-30 §
18:32 <SandraEbele> started Airflow mediwiki wikitext dags after killing oozie jobs as part of Migration task. [analytics]
18:31 <SandraEbele> Killed Oozie mediawiki-wikitext-history-coord and mediawiki-wikitext-current-coord [analytics]
18:28 <SandraEbele> deployed hotfix for airflow mediawiki_wikitext_current and mediawiki_wikitext_history dags. [analytics]
17:30 <SandraEbele> deployed airflow analytics - mediawiki_wikitext dags [analytics]
17:20 <SandraEbele> killed Oozie mediawiki-history-check_denormalize job and started Airflow mediawiki_history_check_denormalize dag. [analytics]
12:32 <joal> Deploy airflow hotfix for referer_daily [analytics]
12:11 <joal> Kill virtualpageview oozie job - migrated to airflow [analytics]
11:56 <joal> Kill oozie referer_daily job - migrated to airflow [analytics]
09:56 <btullis> re-running refine_event [analytics]
09:48 <joal> Deploy airflow analytics [analytics]
09:38 <joal> Deploying refinery onto HDFS [analytics]
09:27 <joal> Deploying refinery using scap [analytics]
2023-03-28 §
15:58 <btullis> deploying refinery to HDFS [analytics]
14:35 <btullis> re-enabling gobblin timers: https://gerrit.wikimedia.org/r/c/operations/puppet/+/903668 T330165 [analytics]
14:31 <btullis> re-enabling YARN queues: https://gerrit.wikimedia.org/r/c/operations/puppet/+/903565 T330165 [analytics]
14:25 <btullis> proceeding to take HDFS out of safe mode. [analytics]
14:25 <btullis> restarting hive-server2 and hive-metastore services on an-coord1001 [analytics]
13:54 <btullis> entering safe mode for analytics-hadoop cluster: T330165 [analytics]
13:37 <btullis> refreshed YARN queues with: `sudo kerberos-run-command yarn /usr/bin/yarn rmadmin -refreshQueues` on both an-master100[1-2] - T330165 [analytics]
13:31 <btullis> setting all four YARN queues to STOPPED https://gerrit.wikimedia.org/r/c/operations/puppet/+/903627 T330165 [analytics]
12:50 <btullis> merging the change to disable ingestion to HDFS https://gerrit.wikimedia.org/r/c/operations/puppet/+/903610 [analytics]
10:46 <btullis> failing over hive services to an-coord1002 prior to switch upgrade. [analytics]
2023-03-27 §
17:19 <milimetric> added 2023-03-14T11 and 2023-03-14T12 partitions for codfw on event.mediawiki_page_move with alter table mediawiki_page_move add partition (datacenter='codfw',year=2023,month=3,day=14,hour=[11,12]); [analytics]
2023-03-24 §
14:43 <topranks> merged alertmanager rules for eventlogging checks being migrated from Icinga T309007 [analytics]
2023-03-23 §
13:48 <joal> Restart virtualpageview-hourly-coord with pageview_allowlist fix - starting 2023-03-21T08:00 [analytics]
13:47 <joal> Kill oozie virtualpageview-hourly-coord job [analytics]
13:29 <joal> Hotfix deploy refinery [analytics]
11:37 <btullis> we changed the retention policy on an-test-druid to `{"period":"P1M","includeFuture":true,"tieredReplicants":{"_default_tier":1},"type":"loadByPeriod"},{"type":"dropForever"}` [analytics]
11:36 <btullis> reimaging an-test-druid1001 in place to upgrade to bullseye [analytics]
08:28 <joal> Rerun failed virtualpageview-druid-daily-wf-2023-3-22 [analytics]
2023-03-21 §
17:48 <joal> rerun failed airflow tasks [analytics]
17:39 <joal> Deploy airflow, hopefully fixing HDFSArchiver jobs [analytics]