analytics SAL

1401-1450 of 5953 results (25ms)

2022-06-13 §
12:26	<btullis>	restarting hive-server2 and hive-metastore on an-coord1002	[analytics]
09:54	<joal>	rerun failed refine for network_flows_internal	[analytics]
09:54	<joal>	Rerun failed refine for mediawiki_talk_page_edit events	[analytics]
09:51	<joal>	Manually rerun webrequest_text laod for hour 2022-06-13T03:00	[analytics]
07:18	<joal>	Manually rerun webrequest_text laod for hour 2022-06-12T08:00	[analytics]
2022-06-10 §
17:00	<ottomata>	applied change to airflow instances to bump scheduler parsing_processes = # of cpu processors	[analytics]
08:58	<btullis>	cookbook sre.hadoop.roll-restart-workers analytics	[analytics]
2022-06-09 §
17:17	<joal>	Rerun refine for failed datasets	[analytics]
14:15	<btullis>	manually failing back HDFS namenode from an-master1002 to an-master1001	[analytics]
13:15	<btullis>	roll-restarting the hadoop masters to pick up new JRE	[analytics]
2022-06-08 §
18:06	<joal>	Restart airflow after deploy for dag reprocessing	[analytics]
18:02	<joal>	deploying Airflow dags	[analytics]
13:45	<btullis>	deploying refinery	[analytics]
2022-06-07 §
13:45	<btullis>	deploying updated eventgate images to all remaining deployments.	[analytics]
11:33	<btullis>	deployed an updated version of eventgate to eventgate-analytics-external to address the timing mis-calculation.	[analytics]
10:51	<btullis>	restart the eventlogging_to_druid_netflow-sanitization_daily service on an-launcher1002	[analytics]
2022-06-06 §
13:45	<btullis>	restarting archiva service for new JRE	[analytics]
06:31	<elukey>	restart memcached on an-tool1005 to pick up puppet settings and clear an alert in icinga	[analytics]
2022-06-05 §
03:14	<milimetric>	rerunning mw history since the last failure just looked like a fluke	[analytics]
2022-06-04 §
11:41	<joal>	Maunally launch refinery-sqoop-mediawiki-production after manual fix of refinery-sqoop-mediawiki	[analytics]
11:39	<joal>	Manually sqoop enwiki:user and commonswiki:user and add _SUCCESS flag for following job to kick off	[analytics]
2022-06-02 §
15:50	<mforns>	deployed wikistats 2.9.5	[analytics]
14:02	<joal>	Start browser_general_daily on airflow	[analytics]
13:19	<joal>	Drop and recreate wmf_raw.mediawiki_page table (field removal)	[analytics]
12:44	<joal>	Remove wrongly formatted interlanguage data	[analytics]
12:36	<joal>	Kill interlanguage-daily oozie job after successfull move to airflow	[analytics]
12:15	<joal>	Deploy interlanguage fix to airflow	[analytics]
09:56	<joal>	Relaunch sqoop after having deployed a corrective patch	[analytics]
09:46	<joal>	Manually mark interlaguage historical tasks failed in airflow	[analytics]
08:54	<joal>	Deploy airflow with spark3 jobs	[analytics]
08:47	<joal>	Merging 2 airflow spark3 jobs now that their refinery counterpart is dpeloyed	[analytics]
08:07	<joal>	Deploy refinery onto HDFS	[analytics]
07:26	<joal>	Deploy refinery using scap	[analytics]
2022-06-01 §
21:04	<milimetric>	trying to rerun sqoop from a screen on an-launcher	[analytics]
20:09	<SandraEbele>	Successfully deployed refinery using scap, then deployed onto hdfs.	[analytics]
18:51	<SandraEbele>	About to deploy analytics/refinery (regular weekly train)	[analytics]
08:39	<elukey>	powercycle an-worker1094 - OEM event registered in `racadm getsel`, host frozen	[analytics]
2022-05-31 §
18:48	<ottomata>	sudo -u hdfs hdfs dfsadmin -safemode leave on an-master1001	[analytics]
18:12	<ottomata>	sudo service hadoop-hdfs-namenode start on an-master1002	[analytics]
18:10	<ottomata>	sudo -u hdfs hdfs dfsadmin -safemode enter	[analytics]
17:47	<btullis>	starting namenode services on am-master1001	[analytics]
17:44	<btullis>	restarting the datanodes on all five of the affected hadoop workers.	[analytics]
17:43	<btullis>	restarting journalnode service on each of the five hadoop workers with journals.	[analytics]
17:41	<btullis>	resizing each journalnode with resize2fs	[analytics]
17:38	<btullis>	sudo lvresize -L+20G analytics1069-vg/journalnode	[analytics]
17:38	<btullis>	increasing each of the hadoop journalnodes by 20 GB	[analytics]
17:33	<ottomata>	stop journalnodes and datanodes on 5 hadoop journalnode hosts	[analytics]
17:30	<btullis>	stopped the hdfs-namenode service on an-master100[1-2]	[analytics]
15:36	<milimetric>	dropped razzi databases and deleted HDFS directories (in trash)	[analytics]
06:26	<elukey>	`elukey@an-master1001:~$ sudo systemctl reset-failed hadoop-clean-fairscheduler-event-logs.service`	[analytics]