analytics SAL

551-600 of 4935 results (21ms)

2022-03-05 §
10:03	<elukey>	restart hadoop-yarn-nodemanager on an-worker1132 (unhealthy node, reason Linux Container Executor reached unrecoverable exception)	[analytics]
2022-03-04 §
17:46	<mforns>	deployed Airflow to analytics instance to fix skein logs problem	[analytics]
15:50	<mforns>	deployed airflow in an-test-client1001 to test skein log fix	[analytics]
05:19	<milimetric>	rerunning monthly edit hourly druid oozie coordinator	[analytics]
2022-03-03 §
17:48	<ottomata>	roll restart aqs to pick up new MW history snapshot	[analytics]
2022-03-01 §
18:38	<SandraEbele>	sandra testing	[analytics]
18:34	<razzi>	demo irc logging to data eng team members	[analytics]
10:19	<btullis>	btullis@an-coord1002:/srv$ sudo rm -rf an-coord1001-backup/ (#T302777)	[analytics]
09:48	<elukey>	elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)	[analytics]
2022-02-28 §
16:00	<milimetric>	refinery done deploying and syncing, new sqoop list is up	[analytics]
15:01	<milimetric>	deploying new wikis to sqoop list ahead of sqoop job starting in a few hours	[analytics]
2022-02-25 §
17:00	<milimetric>	rerunning webrequest-load-wf-text-2022-2-25-15 after confirming all false positive loss	[analytics]
2022-02-23 §
23:00	<razzi>	sudo maintain-views --table flaggedrevs --databases fiwiki on clouddb1014.eqiad.wmnet and clouddb1018.eqiad.wmnet for T302233	[analytics]
2022-02-22 §
10:37	<btullis>	re-enabled puppet on an-launcher1002, having absented the network_internal druid load job	[analytics]
09:30	<aqu>	Deploying analytics/refinery on hadoop-test only.	[analytics]
07:38	<elukey>	systemctl reset-failed mediawiki-history-drop-snapshot on an-launcher1002 (opened since a week ago)	[analytics]
07:30	<elukey>	kill remaining processes of rhuang-ctr on stat1004 and an-test-client1001 (user offboarded, but still holding jupyter notebooks etc..). Puppet was broken trying to remove the user.	[analytics]
2022-02-21 §
17:55	<elukey>	kill remaining processes of rhuang-ctr on various stat nodes (user offboarded, but still holding jupyter notebooks etc..). Puppet was broken trying to remove the user.	[analytics]
16:58	<mforns>	Deployed refinery using scap, then deployed onto hdfs (aqs hourly airflow queries)	[analytics]
2022-02-19 §
12:21	<elukey>	stop puppet on an-launcher1002, stop timers for eventlogging_to_druid_network_flows_internal_{hourly,daily} since no data is coming to the Kafka topic (expected due to some work for the Marseille DC) and it keeps alarming	[analytics]
2022-02-17 §
16:18	<mforns>	deployed wikistats2	[analytics]
2022-02-16 §
14:13	<mforns>	deployed airflow-dags to analytics instance	[analytics]
2022-02-15 §
17:20	<ottomata>	split anaconda-wmf into 2 packages: anaconda-wmf-base and anaconda-wmf. anaconda-wmf-base is installed on workers, anaconda-wmf on clients. The size of the package on workers is now much smaller. Installing throught the cluster. relevant: T292699	[analytics]
2022-02-14 §
17:38	<razzi>	razzi@an-test-client1001:~$ sudo systemctl reset-failed airflow-scheduler@analytics-test.service	[analytics]
16:08	<razzi>	sudo cookbook sre.ganeti.makevm --vcpus 4 --memory 8 --disk 50 eqiad_B datahubsearch1002 for T301383	[analytics]
2022-02-12 §
08:50	<elukey>	truncate /var/log/auth.log to 1g on krb1001 to free space on root partition (original log saved under /srv)	[analytics]
2022-02-11 §
15:06	<ottomata>	set hive.warehouse.subdir.inherit.perms = false - T291664	[analytics]
2022-02-10 §
18:54	<ottomata>	setting up research airflow-dags scap deployment, recreating airflow database and starting from scractch (fab okayed this) - T295380	[analytics]
16:48	<ottomata>	deploying airflow analytics with lots of recent changes to airflow-dags repository	[analytics]
2022-02-09 §
17:41	<joal>	Deploy refinery onto HDFS	[analytics]
17:05	<joal>	Deploying refinery with scap	[analytics]
16:39	<joal>	Release refinery-source v0.1.25 to archiva	[analytics]
2022-02-08 §
07:27	<elukey>	restart hadoop-yarn-nodemanager on an-worker1115 (container executor reached unrecoverable exception, doesn't talk with the Yarn RM anymore)	[analytics]
2022-02-07 §
18:43	<ottomata>	manually installing airflow_2.1.4-py3.7-2_amd64.deb on an-test-client1001	[analytics]
14:38	<ottomata>	merged Set spark maxPartitionBytes to hadoop dfs block size - T300299	[analytics]
12:17	<btullis>	depooled aqs1009	[analytics]
11:59	<btullis>	depooled aqs1008	[analytics]
11:41	<btullis>	depooled aqs1007	[analytics]
11:03	<btullis>	depooled aqs1006	[analytics]
10:22	<btullis>	depooling aqs1005	[analytics]
2022-02-04 §
16:05	<elukey>	unmask prometheus-mysqld-exporter.service and clean up the old @analytics + wmf_auto_restart units (service+timer) not used anymore on an-coord100[12]	[analytics]
12:55	<joal>	Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2022-2-3	[analytics]
07:12	<elukey>	`GRANT PROCESS, REPLICATION CLIENT ON . TO `prometheus`@`localhost` IDENTIFIED VIA unix_socket WITH MAX_USER_CONNECTIONS 5` on an-test-coord1001 to allow the prometheus exporter to gather metrics	[analytics]
07:09	<elukey>	cleanup wmf_auto_restart_prometheus-mysqld-exporter@analytics-meta on an-test-coord1001 and unmasked wmf_auto_restart_prometheus-mysqld-exporter (now used)	[analytics]
07:03	<elukey>	clean up wmf_auto_restart_prometheus-mysqld-exporter@matomo on matomo1002 (not used anymore, listed as failed)	[analytics]
2022-02-03 §
19:35	<joal>	Rerun virtualpageview-druid-monthly-wf-2022-1	[analytics]
19:32	<btullis>	re-running the failed refine_event job as per email.	[analytics]
19:27	<joal>	Rerun virtualpageview-druid-daily-wf-2022-1-16	[analytics]
19:12	<joal>	Kill druid indexation stuck task on Druid (from 2022-01-17T02:31)	[analytics]
19:09	<joal>	Kill druid-loading stuck yarn applications (3 HiveToDruid, 2 oozie launchers)	[analytics]