1-50 of 5770 results (9ms)
2024-01-30 §
18:48 <xcollazo> ran the following commands to create a production test dump folder: [analytics]
18:46 <xcollazo> deployed latest DAG changes to analytics Airflow instance [analytics]
10:17 <btullis> upgrading an-airflow1005 (search) to bullseye for T335261 [analytics]
09:59 <gmodena> starting a scap deployment of analytics airflow dags [analytics]
09:31 <brouberol> yarn.wikimedia.org is back [analytics]
08:59 <brouberol> reimaging an-tool1008, causing unavailability of the yarn.wikimedia.org UI for the duration of the op - T349399 [analytics]
2024-01-29 §
13:06 <brouberol> I'm starting the reimaging process of an-tool1009.eqiad.wmnet, which will cause unavalability of hue.wikimedia.org while it runs - T349400 [analytics]
10:46 <btullis> upgrading an-airflow1007 to bullseye for T335261 [analytics]
2024-01-24 §
15:21 <aqu> Refinery weekly deployment train - end (scap, then deployed onto hdfs) (test cluster deploy still broken T354703) [analytics]
14:31 <aqu> Refinery weekly deployment train - begin [analytics]
2024-01-16 §
16:36 <gmodena> starting refinery deployment using scap [analytics]
16:35 <gmodena> Deployed refinery-source v0.2.28 using jenkins. Jars are on archiva. [analytics]
15:46 <gmodena> releasing and deploying refinery source v0.2.28 [analytics]
2024-01-15 §
17:02 <btullis> roll-restarting public druid cluster [analytics]
17:01 <btullis> roll-restarting analytics druid cluster [analytics]
16:55 <joal> Clearing analytics failed aiflow tasks after fix [analytics]
16:47 <btullis> restarted the hive-server2 and hive-metastore services on an-coord100[3-4] which had been accidentally omitted earlier for T332573 [analytics]
12:00 <btullis> removing all downtime for hadoop-all for T332573 [analytics]
11:57 <btullis> un-pausing all previously paused DAGS on all airflow instances for T332573 [analytics]
11:55 <btullis> re-enabling gobblin jobs [analytics]
11:38 <brouberol> redeploying the Spark History Server to pick up the new HDFS namenodes - T332573 [analytics]
11:29 <btullis> puppet runs cleanly on an-master1003 and it is the active namenode - running puppet an an-master1004. [analytics]
11:20 <btullis> running puppet on an-master1003 to set it to active for T332573 [analytics]
11:16 <btullis> running puppet on journal nodes first for T332573 [analytics]
11:03 <btullis> stopping all hadoop services [analytics]
10:59 <btullis> disabling puppet on all hadoop nodes [analytics]
10:54 <btullis> putting HDFS into safe mode for T332573 [analytics]
2024-01-10 §
12:47 <stevemunene> roll restarting hadoop test workers to pick up new JRE [analytics]
12:22 <stevemunene> decommission druid1006.eqiad.wmnet T354743 [analytics]
12:05 <stevemunene> decommission druid1005.eqiad.wmnet T354742 [analytics]
11:39 <stevemunene> decommission druid1004.eqiad.wmnet T354741 [analytics]
2024-01-09 §
21:28 <aqu> airflow-dags/analytics(_test) are both deployed [analytics]
21:18 <aqu> analytics/refinery not deployed fully on test cluster. Ticket for the bug here: https://phabricator.wikimedia.org/T354703 [analytics]
21:07 <aqu> Deployed refinery using scap, then deployed onto hdfs [analytics]
20:48 <aqu> about to deploy analytics/refinery - weekly train [analytics]
12:57 <stevemunene> roll restart analytics hadoop masters to pickup new net_topology script and new JRE T254480 [analytics]
11:48 <stevemunene> roll restarting hadoop test masters to pick up new net_topology script and new JRE [analytics]
11:36 <stevemunene> disable puppet on hadoop masters both test and production to test/implement new net_topology script [analytics]
10:39 <btullis> roll-restarting kafka-jumbo to pick up new JRE [analytics]
2024-01-08 §
17:22 <btullis> migrated s1-analytics-replica to dbstore1008 for T351921 [analytics]
17:19 <btullis> migrated s5-analytics-replica to dbstore1008 for T351921 [analytics]
15:56 <btullis> migrating s7-analytics-replica to dbstore1008 for T351921 [analytics]
2024-01-03 §
10:32 <btullis> restarted the monitor_refine_event.service on an-launcher1002 to clear alert [analytics]
2024-01-02 §
15:36 <btullis> migrating analytics-hive.eqiad.wmnet to an-coord1003 for T336045 [analytics]
10:56 <brouberol> configuring [eqiad,codfw].mediawiki.cirrussearch.page_rerender.v1 as compacted topics on jumbo-eqiad - T353715 [analytics]
09:24 <btullis> adding three days' downtime to dbstore1008, prior to switching its role to `mariadb::analytics_replica` for T351921 [analytics]
2024-01-01 §
17:11 <joal> Deploying airflow to fix pageview daily aggregated monthly job [analytics]
2023-12-22 §
21:38 <mforns> re-ran the Airflow DAG cassandra_load_unique_devices_daily for 2023-12-14 [analytics]
21:37 <mforns> re-ran the Airflow DAG druid_load_unique_devices_per_domain_daily for 2023-12-14 [analytics]
21:37 <mforns> re-ran the Airflow DAG druid_load_unique_devices_per_project_family_daily for 2023-12-14 [analytics]