analytics SAL

201-250 of 5483 results (30ms)

2023-06-20 §
18:13	<mforns>	deployed airflow analytics to fix webrequest job	[analytics]
17:52	<joal>	deploy Refinery to unbreak webrequrest	[analytics]
2023-06-19 §
14:04	<elukey>	move varnishafka instances in eqsin to PKI - T337825	[analytics]
11:28	<stevemunene>	decommission host analytics1060.eqiad.wmnet -t T338409	[analytics]
10:47	<stevemunene>	decommission host analytics1059.eqiad.wmnet -t T338408	[analytics]
09:13	<stevemunene>	Decommissioning analytics1058.eqiad.wmnet -t T338227	[analytics]
2023-06-16 §
12:18	<btullis>	restarting the remaining monitor_refine_event_sanitized_analytics_immediate.service monitor_refine_event_sanitized_main_delayed.service monitor_refine_event_sanitized_main_immediate.service services on an-launcher1002	[analytics]
12:11	<btullis>	restarting refine_event_sanitized_main_delayed.service on an-launcher1002	[analytics]
12:03	<btullis>	restarting refine_event_sanitized_analytics_delayed.service on an-launcher1002	[analytics]
11:14	<btullis>	rebooting an-test-worker1002 for T335358 and stuck gobblin	[analytics]
10:13	<joal>	rerun druid_load_edit_hourly to reload full snapshot	[analytics]
2023-06-15 §
19:27	<btullis>	restarting aqs service on A:aqs in batches of 2, 10 seconds apart	[analytics]
17:02	<joal>	Deploying airflow (again) to fix memory issues	[analytics]
15:58	<joal>	Rerun druid indexation for mediawiki_history_reduced	[analytics]
15:56	<joal>	Deploy airflow to fix druid loading jobs using snapshot	[analytics]
15:53	<milimetric>	refinery-source 0.2.17 deployed, refinery updated and synced to hdfs	[analytics]
12:47	<stevemunene>	roll running sre.hadoop.roll-restart-masters to completely remove any reference of analytics1058-1060 for T317861	[analytics]
12:34	<joal>	Deploy analytics-airlfow to patch mediawiki_history_reduced druid loading	[analytics]
09:05	<elukey>	move varnishkafka instances in ulsfo to PKI	[analytics]
2023-06-14 §
20:18	<milimetric>	reran mediawiki_history_reduced druid load task after deploying Joseph's fix	[analytics]
13:15	<stevemunene>	running the puppet on an-master100[1-2] Remove analytics58_60 from the HDFS topology T317861	[analytics]
2023-06-13 §
19:27	<btullis>	restarting the hive-server2 and hive-metastore services on an-coord1001	[analytics]
19:03	<btullis>	freeing up space in /srv on an-launcher1002 with `btullis@an-launcher1002:/srv/airflow-analytics/logs/scheduler$ find -maxdepth 1 -type d -mtime +15 -print0 \| xargs -0 sudo rm -rf` for T339002	[analytics]
16:41	<ottomata>	deploying refinery for weekly train	[analytics]
15:45	<SandraEbele>	Deployed refinery-source using jenkins	[analytics]
15:19	<ottomata>	drop event.mediawiki_page_outlink_topic_prediction_change table and data - T337395	[analytics]
15:13	<SandraEbele>	deploying refinery source	[analytics]
15:05	<ottomata>	dropping hive table event.mediawiki_page_change_v1 to pick up backwards incompatible schema change - T337395	[analytics]
15:03	<btullis>	failing over the analytics-hive cname to an-coord1002	[analytics]
13:45	<elukey>	fixed broken graphs in the varnishkafka's dashboard	[analytics]
13:37	<btullis>	restarting hive-server2 and hive-metastore on an-coord1002 prior to failover.	[analytics]
13:00	<btullis>	rolled out conda-analytics 0.0.18 to analytics-airflow and hadoop-coordinator	[analytics]
12:25	<btullis>	beginning rollout of conda-analytics 0.0.18 to hadoop-workers	[analytics]
07:10	<elukey>	move varnishkafka instances on cp4037 to PKI TLS certs	[analytics]
2023-06-12 §
12:39	<btullis>	ran apt clean on an-testui1001 to get some free disk space.	[analytics]
11:30	<btullis>	resuming deployment of eventgate-main	[analytics]
09:58	<btullis>	deploying eventgate-main	[analytics]
08:52	<btullis>	restart monitor_refine_netflow service on an-launcher1002 after successful job re-run.	[analytics]
08:36	<btullis>	re-running the refine_netflow task	[analytics]
2023-06-09 §
20:40	<btullis>	restarting the aqs service more quickly with: `sudo cumin -b 2 -s 10 A:aqs 'systemctl restart aqs'`	[analytics]
20:23	<btullis>	btullis@cumin1001:~$ sudo cookbook sre.aqs.roll-restart-reboot --alias aqs restart_daemons --reason aqs_rollback_btullis	[analytics]
20:22	<btullis>	merged and deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/928927 to revert aqs mediawiki snapshot change	[analytics]
2023-06-08 §
17:12	<btullis>	running the sre.hadoop.roll-restart-masters cookbook for the analytics cluster, to pick up the new journalnode for T338336	[analytics]
17:01	<btullis>	running puppet on an-worker1142 to start the new journalnode	[analytics]
06:42	<stevemunene>	stop hadoop-hdfs-journalnode on analytics1069 in order to swap the journal node with an-worker1142 T338336	[analytics]
06:10	<elukey>	kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet	[analytics]
2023-06-07 §
15:38	<btullis>	installing presto 0.281 to the test cluster	[analytics]
15:23	<elukey>	all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825	[analytics]
14:13	<btullis>	running `sudo cumin A:wikireplicas-web 'maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-web	[analytics]
14:04	<btullis>	running `maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-analytics	[analytics]