analytics SAL

251-300 of 6022 results (32ms)

2024-01-31 §
11:56	<btullis>	rebooting dbstore1008 for new kernel version (T356239)	[analytics]
10:57	<btullis>	deploying https://gerrit.wikimedia.org/r/c/analytics/superset/deploy/+/994213 to superset-next to test nested display of presto columns	[analytics]
2024-01-30 §
18:48	<xcollazo>	ran the following commands to create a production test dump folder:	[analytics]
18:46	<xcollazo>	deployed latest DAG changes to analytics Airflow instance	[analytics]
10:17	<btullis>	upgrading an-airflow1005 (search) to bullseye for T335261	[analytics]
09:59	<gmodena>	starting a scap deployment of analytics airflow dags	[analytics]
09:31	<brouberol>	yarn.wikimedia.org is back	[analytics]
08:59	<brouberol>	reimaging an-tool1008, causing unavailability of the yarn.wikimedia.org UI for the duration of the op - T349399	[analytics]
2024-01-29 §
13:06	<brouberol>	I'm starting the reimaging process of an-tool1009.eqiad.wmnet, which will cause unavalability of hue.wikimedia.org while it runs - T349400	[analytics]
10:46	<btullis>	upgrading an-airflow1007 to bullseye for T335261	[analytics]
2024-01-24 §
15:21	<aqu>	Refinery weekly deployment train - end (scap, then deployed onto hdfs) (test cluster deploy still broken T354703)	[analytics]
14:31	<aqu>	Refinery weekly deployment train - begin	[analytics]
2024-01-16 §
16:36	<gmodena>	starting refinery deployment using scap	[analytics]
16:35	<gmodena>	Deployed refinery-source v0.2.28 using jenkins. Jars are on archiva.	[analytics]
15:46	<gmodena>	releasing and deploying refinery source v0.2.28	[analytics]
2024-01-15 §
17:02	<btullis>	roll-restarting public druid cluster	[analytics]
17:01	<btullis>	roll-restarting analytics druid cluster	[analytics]
16:55	<joal>	Clearing analytics failed aiflow tasks after fix	[analytics]
16:47	<btullis>	restarted the hive-server2 and hive-metastore services on an-coord100[3-4] which had been accidentally omitted earlier for T332573	[analytics]
12:00	<btullis>	removing all downtime for hadoop-all for T332573	[analytics]
11:57	<btullis>	un-pausing all previously paused DAGS on all airflow instances for T332573	[analytics]
11:55	<btullis>	re-enabling gobblin jobs	[analytics]
11:38	<brouberol>	redeploying the Spark History Server to pick up the new HDFS namenodes - T332573	[analytics]
11:29	<btullis>	puppet runs cleanly on an-master1003 and it is the active namenode - running puppet an an-master1004.	[analytics]
11:20	<btullis>	running puppet on an-master1003 to set it to active for T332573	[analytics]
11:16	<btullis>	running puppet on journal nodes first for T332573	[analytics]
11:03	<btullis>	stopping all hadoop services	[analytics]
10:59	<btullis>	disabling puppet on all hadoop nodes	[analytics]
10:54	<btullis>	putting HDFS into safe mode for T332573	[analytics]
2024-01-10 §
12:47	<stevemunene>	roll restarting hadoop test workers to pick up new JRE	[analytics]
12:22	<stevemunene>	decommission druid1006.eqiad.wmnet T354743	[analytics]
12:05	<stevemunene>	decommission druid1005.eqiad.wmnet T354742	[analytics]
11:39	<stevemunene>	decommission druid1004.eqiad.wmnet T354741	[analytics]
2024-01-09 §
21:28	<aqu>	airflow-dags/analytics(_test) are both deployed	[analytics]
21:18	<aqu>	analytics/refinery not deployed fully on test cluster. Ticket for the bug here: https://phabricator.wikimedia.org/T354703	[analytics]
21:07	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
20:48	<aqu>	about to deploy analytics/refinery - weekly train	[analytics]
12:57	<stevemunene>	roll restart analytics hadoop masters to pickup new net_topology script and new JRE T254480	[analytics]
11:48	<stevemunene>	roll restarting hadoop test masters to pick up new net_topology script and new JRE	[analytics]
11:36	<stevemunene>	disable puppet on hadoop masters both test and production to test/implement new net_topology script	[analytics]
10:39	<btullis>	roll-restarting kafka-jumbo to pick up new JRE	[analytics]
2024-01-08 §
17:22	<btullis>	migrated s1-analytics-replica to dbstore1008 for T351921	[analytics]
17:19	<btullis>	migrated s5-analytics-replica to dbstore1008 for T351921	[analytics]
15:56	<btullis>	migrating s7-analytics-replica to dbstore1008 for T351921	[analytics]
2024-01-03 §
10:32	<btullis>	restarted the monitor_refine_event.service on an-launcher1002 to clear alert	[analytics]
2024-01-02 §
15:36	<btullis>	migrating analytics-hive.eqiad.wmnet to an-coord1003 for T336045	[analytics]
10:56	<brouberol>	configuring [eqiad,codfw].mediawiki.cirrussearch.page_rerender.v1 as compacted topics on jumbo-eqiad - T353715	[analytics]
09:24	<btullis>	adding three days' downtime to dbstore1008, prior to switching its role to `mariadb::analytics_replica` for T351921	[analytics]
2024-01-01 §
17:11	<joal>	Deploying airflow to fix pageview daily aggregated monthly job	[analytics]
2023-12-22 §
21:38	<mforns>	re-ran the Airflow DAG cassandra_load_unique_devices_daily for 2023-12-14	[analytics]