analytics SAL

251-300 of 6115 results (36ms)

2024-03-28 §
15:00	<elukey>	remove GPU labels in Hadoop Yarn for an-worker[1096-1099] (the hosts don't have a GPU anymore) - T361225	[analytics]
2024-03-27 §
15:14	<brouberol>	decommissioning an-tool1009 now that hue is fully offline - T341895	[analytics]
15:02	<brouberol>	dropping the hue.wikimedia.org CNAME - T341895	[analytics]
2024-03-25 §
15:02	<btullis>	updating the ssl_provider for eventstreams schema servers to cfssl for T360412	[analytics]
2024-03-22 §
13:17	<elukey>	`elukey@cumin1002:~$ sudo cumin 'stat100[4,5,8,9]*' 'kill `pgrep -u kcv-wikimf`'` to unblock puppet on various stat nodes	[analytics]
10:44	<btullis>	shut down an-worker1168 to investigate disk controller failure for T360594	[analytics]
2024-03-20 §
10:50	<brouberol>	superset.wikimedia.org is now migrated to the DSE k8s cluster, CAS errors have receeded	[analytics]
10:20	<brouberol>	migrating superset to Kubernetes. Some CAS errors are expected during ~15 minutes	[analytics]
2024-03-07 §
14:01	<btullis>	deploying updated mediwiki_history_reduced snapshots to AQS 2.0	[analytics]
2024-03-04 §
12:22	<btullis>	restarting hive-server2 and hive-metastore service on an-coord1003	[analytics]
12:00	<btullis>	migrating analytics-hive from an-coord1003 to an-coord1004 with https://gerrit.wikimedia.org/r/c/operations/dns/+/1008414	[analytics]
10:32	<btullis>	restart hive-server2 and hive-metastore service on an-coord1004	[analytics]
2024-02-29 §
14:06	<btullis>	sudo systemctl reset-failed refinery-sqoop-whole-mediawiki.service	[analytics]
09:59	<joal>	Deploying refinery with scap (fix sqoop for tomorrow)	[analytics]
09:25	<brouberol>	decommissioning an-tool1005 now that superset-next is migrated to k8s - T358706	[analytics]
2024-02-28 §
11:08	<btullis>	reimaging dbstore1007 to bookworm for T356961	[analytics]
09:48	<joal>	Deploying refinery onto HDFS	[analytics]
09:28	<joal>	Deploying Refinery for T357859	[analytics]
2024-02-27 §
18:14	<tchin>	deploying eventstreams	[analytics]
2024-02-22 §
11:52	<brouberol>	redeploying the spark-history server with expanded egress rules for hadoop workers - T358206	[analytics]
2024-02-21 §
21:21	<joal>	Update airflow variable for pageview_actor-hourly leading to 64 written files instead of 32 - this should ease the job resource consumption and prevent failures	[analytics]
19:51	<joal>	Rerun pageview_actor_hourly for hour 2024-02-20T07:00	[analytics]
2024-02-20 §
22:52	<sfaci>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
22:18	<sfaci>	Starting refinery deployment	[analytics]
15:57	<xcollazo>	deployed latest Airflow DAG updates for the analytics instance	[analytics]
2024-02-19 §
11:14	<sfaci>	rerunning the compute_pageview_actor_hourly task in the pageview_actor_hourly DAG 2024-02-17 08:00:00 UTC	[analytics]
2024-02-13 §
09:03	<brouberol>	attempting a reimage of apifeatureusage1001 to bookworm - T346053	[analytics]
2024-02-09 §
14:01	<brouberol>	superset was successfully deployed once the MySQL password was updated - T347710	[analytics]
13:47	<brouberol>	deploying superset/superset-next services in dse-k8s-eqiad - T347710	[analytics]
2024-02-08 §
09:50	<stevemunene>	failover hadoop namenode back to an-master1003 T353776	[analytics]
2024-02-07 §
20:17	<joal>	Relaunch session_length_daily failed task	[analytics]
20:09	<joal>	Relaunch druid_load_unique_devices_per_domain_daily_aggregated_monthly after deploy	[analytics]
19:49	<joal>	deploying Refinery onto HDFS	[analytics]
19:49	<joal>	Deployed refinery using scap	[analytics]
19:49	<joal>	Release refinery-source v0.2.32	[analytics]
17:26	<btullis>	roll-restarting kafka-jumbo for T356382	[analytics]
15:35	<btullis>	rolling out a change of the discovery-uri to presto workers and clients https://gerrit.wikimedia.org/r/c/operations/puppet/+/998425	[analytics]
13:01	<stevemunene>	failover hadoop namenode back to an-master1003 after the jvm service restart to pick up new JDK and T353776	[analytics]
12:48	<stevemunene>	restart jvm services on an-master1003 for T353776 and to pick up new JDK	[analytics]
12:36	<stevemunene>	failover hadoop namenode to an-master1004 for jvm service restart to pick up new JDK and T353776	[analytics]
12:24	<stevemunene>	restart jvm services on an-master1004 for T353776 and to pick up new JDK	[analytics]
2024-02-06 §
19:57	<joal>	Deploy refinery onto HDFS	[analytics]
19:34	<joal>	Deploying refinery using scap	[analytics]
19:34	<joal>	Refinery-source v0.2.31 released to archiva	[analytics]
14:57	<btullis>	roll-restarting the presto workers for T356382	[analytics]
14:04	<joal>	Rerun mediawiki-history-reduced druid indexation after airflow variable update	[analytics]
13:39	<brouberol>	add new TLS SANs to the superset/superset-next certificates in dse-k8s-eqiad - T356481	[analytics]
13:29	<stevemunene>	roll restart hadoop masters to pick up the right rack assignment for new hosts T353776	[analytics]
11:45	<stevemunene>	add new an-workers to analytics_cluster hadoop worker role analytics_cluster::hadoop::worker T353776	[analytics]
11:03	<btullis>	reimaging an-web1001 to bullseye for T349398	[analytics]