analytics SAL

151-200 of 5005 results (27ms)

2022-12-19 §
13:45	<btullis>	restart presto-server on an-coord1001 to increase heap from 4GB to 16 GB T325331	[analytics]
12:11	<aqu>	systemctl start hadoop-namenode-backup-hdfs.service on an-master1002 at 11am UTC	[analytics]
09:36	<aqu>	Deployed analytics/refinery using scap, then deployed onto HDFS.	[analytics]
09:17	<aqu>	About to deploy analytics/refinery (bug fix in HDFS usage pipeline)	[analytics]
2022-12-16 §
15:36	<xcollazo>	deploying 'Fix subtle bug on image_suggestions when resolving varprop.' on platform_eng Airflow instance.	[analytics]
2022-12-15 §
22:28	<btullis>	run `sudo apt clean` on an-coord1001	[analytics]
19:08	<xcollazo>	Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance.	[analytics]
10:03	<joal>	Restart failed airflow tasks	[analytics]
2022-12-13 §
21:35	<aqu>	Deploying analytics/refinery (HDFS FSImage conversion to XML script)	[analytics]
2022-12-09 §
08:38	<joal>	Kill refine_eventlogging_legacy stuck job (application_1663082229270_510052)	[analytics]
2022-12-08 §
13:55	<joal>	rerun webrequest failed jobs for hour 2022-12-08-T11:00Z with updated workflow (no dataloss checks)	[analytics]
12:23	<joal>	rerun webrequest failed jobs for hour 2022-12-08-T11:00Z	[analytics]
2022-12-07 §
17:57	<aqu>	Adding raw hdfs fsimage dir in HDFS (an-launcher1002)	[analytics]
17:47	<aqu>	Adding hdfs/usage folder dataset in HDFS	[analytics]
16:24	<aqu>	Deploying analytics/refinery (HDFS usage scripts)	[analytics]
15:13	<btullis>	roll-restarting AQS to pick up new mediawiki_history_reduce snapshot	[analytics]
14:06	<btullis>	rebuilding an-tool1005 as bullseye to test superset 1.5.2 upgrade	[analytics]
09:10	<btullis>	reboot an-worker1108 as it was spinning with soft CPU lockups	[analytics]
2022-12-06 §
12:47	<btullis>	sudo systemctl restart wmf_auto_restart_prometheus-mysqld-exporter.service on matomo1002	[analytics]
11:53	<btullis>	attempting to unmount and remount `/mnt/hdfs` on stat1004	[analytics]
2022-12-05 §
11:45	<steve_munene>	restarting presto-server.service on an-presto1007 T323783	[analytics]
2022-11-30 §
16:44	<btullis>	roll-restarting presto workers again for T321960 and T321231	[analytics]
16:20	<btullis>	roll-restarting presto workers for T321960 and T321231	[analytics]
16:19	<btullis>	restarting presto-server on an-coord1001 for T321960 and T321231	[analytics]
13:39	<btullis>	pushing out conda-analytics to all remaining servers `btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-11-30-conda-analytics.yaml -Q P:analytics::conda_analytics`	[analytics]
13:02	<btullis>	deploying conda-analytics 0.0.12 to stat boxes for T321088	[analytics]
12:29	<btullis>	repooling eqiad for eventstreams for T324074	[analytics]
11:59	<btullis>	depooling eqiad for eventstreams for T324074	[analytics]
11:34	<btullis>	repooling codfw for eventstreams for T324074	[analytics]
11:32	<btullis>	destroying the eventstreams deployment in codfw and reapplying for T324074	[analytics]
11:11	<btullis>	depooling codfw for eventstreams for T324074	[analytics]
2022-11-29 §
17:12	<ottomata>	deploying refinery, then restarting druid webrequest daily and hourly loading oozie jobs	[analytics]
17:08	<btullis>	booted all of the an-worker nodes that had been switched off.	[analytics]
15:04	<btullis>	shutting down an-worker1093	[analytics]
15:03	<btullis>	shutting down an-worker1089	[analytics]
15:02	<btullis>	shutting down an-worker1085	[analytics]
15:00	<btullis>	shutting down an-worker1083	[analytics]
14:58	<btullis>	shutting down an-worker1079	[analytics]
14:55	<btullis>	shutting down an-worker1090	[analytics]
2022-11-28 §
12:00	<btullis>	restarted presto-server on an-coord1001 to test T321960	[analytics]
2022-11-25 §
15:29	<btullis>	reset the bmc on an-coord1002	[analytics]
11:24	<elukey>	restart turnilo on an-tool1007 to pick up new settings for webrequest_sampled_live	[analytics]
10:07	<elukey>	refresh the webrequest-sampled-live druid supervisor after https://gerrit.wikimedia.org/r/c/analytics/refinery/+/859463	[analytics]
2022-11-24 §
16:21	<SandraEbele>	restarted webrequest-druid-daily-coord as part of weekly deployment train.	[analytics]
16:15	<SandraEbele>	killed webrequest-druid-daily-coord for restart as part of weekly deployment train.	[analytics]
16:13	<SandraEbele>	successfully restarted webrequest-druid-hourly-coord for restart as part of weekly deployment train.	[analytics]
16:11	<SandraEbele>	killed webrequest-druid-hourly-coord for restart as part of weekly deployment train.	[analytics]
15:30	<SandraEbele>	Started deployment of refinery as part of weekly deployment train	[analytics]
2022-11-23 §
15:38	<btullis>	roll-restarting kafka-jumbo brokers to pick up new certificates. T323697	[analytics]
2022-11-18 §
18:56	<mforns>	re-ran refine_event_sanitized_analytics_immediate from 2022-11-17T13 to 2022-11-18T18 to fix the issues caused by a bug (allow-list typo) deployed yesterday.	[analytics]