analytics SAL

1-50 of 5244 results (25ms)

2023-06-09 §
20:40	<btullis>	restarting the aqs service more quickly with: `sudo cumin -b 2 -s 10 A:aqs 'systemctl restart aqs'`	[analytics]
20:23	<btullis>	btullis@cumin1001:~$ sudo cookbook sre.aqs.roll-restart-reboot --alias aqs restart_daemons --reason aqs_rollback_btullis	[analytics]
20:22	<btullis>	merged and deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/928927 to revert aqs mediawiki snapshot change	[analytics]
2023-06-08 §
17:12	<btullis>	running the sre.hadoop.roll-restart-masters cookbook for the analytics cluster, to pick up the new journalnode for T338336	[analytics]
17:01	<btullis>	running puppet on an-worker1142 to start the new journalnode	[analytics]
06:42	<stevemunene>	stop hadoop-hdfs-journalnode on analytics1069 in order to swap the journal node with an-worker1142 T338336	[analytics]
06:10	<elukey>	kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet	[analytics]
2023-06-07 §
15:38	<btullis>	installing presto 0.281 to the test cluster	[analytics]
15:23	<elukey>	all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825	[analytics]
14:13	<btullis>	running `sudo cumin A:wikireplicas-web 'maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-web	[analytics]
14:04	<btullis>	running `maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-analytics	[analytics]
11:52	<btullis>	running `sudo maintain-views --all-databases --table abuse_filter_history --replace-all` on clouddbd1021 for T315426	[analytics]
08:02	<elukey>	set "loadByPeriod(P15D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460	[analytics]
2023-06-06 §
15:52	<elukey>	restart yarn resourcemanager on an-master1002 to restore the Yarn UI (that works only when the active yarn RM is on 1001)	[analytics]
15:07	<mforns>	deployed airflow analytics to try and fix the edit_hourly DAG again	[analytics]
13:09	<ottomata>	EventStreamConfig - temporarily Disable canary events and hadoop ingestion for development.network.probe stream - T332024	[analytics]
11:29	<stevemunene>	service hadoop-yarn-resourcemanager restart for T317861	[analytics]
11:13	<btullis>	restart airflow-scheduler service on an-test-client1001 for analytics_test instance	[analytics]
11:12	<btullis>	restart airflow-scheduler service on an-airflow1006 for product_analytics instance	[analytics]
11:12	<btullis>	restart airflow-scheduler service on an-airflow1005 for search instance	[analytics]
11:08	<btullis>	restart airflow-scheduler service on an-airflow1002 for research instance	[analytics]
11:07	<btullis>	(correction) that should have read an-airflow1004 for platform_eng instance	[analytics]
11:06	<btullis>	restart airflow-scheduler service on an-launcher1004 for postgresql restart	[analytics]
11:05	<btullis>	restart airflow-scheduler service on an-launcher1002 for postgresql restart	[analytics]
05:41	<stevemunene>	hadoop-yarn-resourcemanager restart for T317861	[analytics]
2023-06-05 §
18:20	<btullis>	restarted haproxy service on dbproxy1018 for T338172	[analytics]
16:21	<btullis>	depooling service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet	[analytics]
16:20	<btullis>	pooling service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet to allow us to depool the analytics wikireplica servers	[analytics]
15:19	<mforns>	deployed airflow analytics to fix edit_hourly DAG	[analytics]
11:43	<btullis>	sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
09:52	<btullis>	powered up an-worker1125	[analytics]
2023-06-01 §
19:09	<mforns>	deploy airflow analytics to bump up cassandra load monthly for top articles	[analytics]
17:50	<mforns>	deployed airflow analytics to unbreak monthly cassandra loading DAGs	[analytics]
14:13	<mforns>	deployed airflow analytics to fix anomaly detection ooms	[analytics]
2023-05-31 §
20:41	<mforns>	finished refinery deployment	[analytics]
20:20	<mforns>	starting refinery deployment	[analytics]
07:29	<elukey>	set "loadByPeriod(P8D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460	[analytics]
2023-05-30 §
15:52	<xcollazo>	created HDFS folder `/wmf/data/wmf_traffic` (T335305 and T337562)	[analytics]
2023-05-26 §
06:42	<elukey>	`apt-get clean` on stat1008 to clean up some space in the root partition	[analytics]
06:36	<elukey>	`truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up	[analytics]
2023-05-25 §
13:42	<joal>	rerun webrequest-refine job for 2023-05-20T00 - we're missing data	[analytics]
12:31	<elukey>	set "loadByPeriod(P3D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460	[analytics]
08:37	<joal>	rerun druid_load_webrequest_sampled_128_daily 2023-05-20 to reload missing hour (T337088)	[analytics]
08:37	<joal>	rerun druid_load_webrequest_sampled_128_daily	[analytics]
2023-05-24 §
16:19	<aqu>	Deployed refinery using scap, then deployed onto hdfs	[analytics]
16:05	<elukey>	move kafka mirror on kafka main brokers to PKI - T337248	[analytics]
15:56	<elukey>	move kafka mirror on kafka jumbo brokers to PKI - T337248	[analytics]
15:48	<elukey>	run `kafka acls --add --allow-principal User:CN=kafka_mirror_maker --producer --topic '*'` on kafka test - T337248	[analytics]
15:18	<aqu>	analytics-refinery, about to deploy	[analytics]
12:21	<joal>	rerun failed druid_load_pageviews_hourly_aggregated_daily 2023-05-17	[analytics]