analytics SAL

351-400 of 6128 results (36ms)

2024-01-31 §
17:00	<phuedx>	phuedx@deploy2002 Started deploy [analytics/refinery@2c00cad] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2c00cad1]	[analytics]
16:57	<phuedx>	phuedx@deploy2002 Finished deploy [analytics/refinery@2c00cad] (thin): Regular analytics weekly train THIN [analytics/refinery@2c00cad1] (duration: 00m 06s)	[analytics]
16:57	<phuedx>	phuedx@deploy2002 Started deploy [analytics/refinery@2c00cad] (thin): Regular analytics weekly train THIN [analytics/refinery@2c00cad1]	[analytics]
16:53	<phuedx>	phuedx@deploy2002 Finished deploy [analytics/refinery@2c00cad]: Regular analytics weekly train [analytics/refinery@2c00cad1] (duration: 09m 52s)	[analytics]
16:52	<phuedx>	Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)]	[analytics]
12:12	<btullis>	rebooting dbstore1009 for new kernel version (T356239)	[analytics]
11:56	<btullis>	rebooting dbstore1008 for new kernel version (T356239)	[analytics]
10:57	<btullis>	deploying https://gerrit.wikimedia.org/r/c/analytics/superset/deploy/+/994213 to superset-next to test nested display of presto columns	[analytics]
2024-01-30 §
18:48	<xcollazo>	ran the following commands to create a production test dump folder:	[analytics]
18:46	<xcollazo>	deployed latest DAG changes to analytics Airflow instance	[analytics]
10:17	<btullis>	upgrading an-airflow1005 (search) to bullseye for T335261	[analytics]
09:59	<gmodena>	starting a scap deployment of analytics airflow dags	[analytics]
09:31	<brouberol>	yarn.wikimedia.org is back	[analytics]
08:59	<brouberol>	reimaging an-tool1008, causing unavailability of the yarn.wikimedia.org UI for the duration of the op - T349399	[analytics]
2024-01-29 §
13:06	<brouberol>	I'm starting the reimaging process of an-tool1009.eqiad.wmnet, which will cause unavalability of hue.wikimedia.org while it runs - T349400	[analytics]
10:46	<btullis>	upgrading an-airflow1007 to bullseye for T335261	[analytics]
2024-01-24 §
15:21	<aqu>	Refinery weekly deployment train - end (scap, then deployed onto hdfs) (test cluster deploy still broken T354703)	[analytics]
14:31	<aqu>	Refinery weekly deployment train - begin	[analytics]
2024-01-16 §
16:36	<gmodena>	starting refinery deployment using scap	[analytics]
16:35	<gmodena>	Deployed refinery-source v0.2.28 using jenkins. Jars are on archiva.	[analytics]
15:46	<gmodena>	releasing and deploying refinery source v0.2.28	[analytics]
2024-01-15 §
17:02	<btullis>	roll-restarting public druid cluster	[analytics]
17:01	<btullis>	roll-restarting analytics druid cluster	[analytics]
16:55	<joal>	Clearing analytics failed aiflow tasks after fix	[analytics]
16:47	<btullis>	restarted the hive-server2 and hive-metastore services on an-coord100[3-4] which had been accidentally omitted earlier for T332573	[analytics]
12:00	<btullis>	removing all downtime for hadoop-all for T332573	[analytics]
11:57	<btullis>	un-pausing all previously paused DAGS on all airflow instances for T332573	[analytics]
11:55	<btullis>	re-enabling gobblin jobs	[analytics]
11:38	<brouberol>	redeploying the Spark History Server to pick up the new HDFS namenodes - T332573	[analytics]
11:29	<btullis>	puppet runs cleanly on an-master1003 and it is the active namenode - running puppet an an-master1004.	[analytics]
11:20	<btullis>	running puppet on an-master1003 to set it to active for T332573	[analytics]
11:16	<btullis>	running puppet on journal nodes first for T332573	[analytics]
11:03	<btullis>	stopping all hadoop services	[analytics]
10:59	<btullis>	disabling puppet on all hadoop nodes	[analytics]
10:54	<btullis>	putting HDFS into safe mode for T332573	[analytics]
2024-01-10 §
12:47	<stevemunene>	roll restarting hadoop test workers to pick up new JRE	[analytics]
12:22	<stevemunene>	decommission druid1006.eqiad.wmnet T354743	[analytics]
12:05	<stevemunene>	decommission druid1005.eqiad.wmnet T354742	[analytics]
11:39	<stevemunene>	decommission druid1004.eqiad.wmnet T354741	[analytics]
2024-01-09 §
21:28	<aqu>	airflow-dags/analytics(_test) are both deployed	[analytics]
21:18	<aqu>	analytics/refinery not deployed fully on test cluster. Ticket for the bug here: https://phabricator.wikimedia.org/T354703	[analytics]
21:07	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
20:48	<aqu>	about to deploy analytics/refinery - weekly train	[analytics]
12:57	<stevemunene>	roll restart analytics hadoop masters to pickup new net_topology script and new JRE T254480	[analytics]
11:48	<stevemunene>	roll restarting hadoop test masters to pick up new net_topology script and new JRE	[analytics]
11:36	<stevemunene>	disable puppet on hadoop masters both test and production to test/implement new net_topology script	[analytics]
10:39	<btullis>	roll-restarting kafka-jumbo to pick up new JRE	[analytics]
2024-01-08 §
17:22	<btullis>	migrated s1-analytics-replica to dbstore1008 for T351921	[analytics]
17:19	<btullis>	migrated s5-analytics-replica to dbstore1008 for T351921	[analytics]
15:56	<btullis>	migrating s7-analytics-replica to dbstore1008 for T351921	[analytics]