analytics SAL

101-150 of 4729 results (32ms)

2022-08-10 §
11:47	<btullis>	btullis@an-coord1001:~$ sudo systemctl restart hive-server2.service hive-metastore.service	[analytics]
2022-08-08 §
11:43	<btullis>	rebooting an-worker1102 due to kernel soft lockups	[analytics]
2022-08-05 §
16:05	<milimetric>	force scap deploying refinery	[analytics]
16:01	<ottomata>	removing airflow logs older than 7 days on an-launcher1002	[analytics]
2022-08-04 §
18:31	<ottomata>	dropping medawiki_web_ui_interactions hive tables and data - T314151	[analytics]
18:19	<milimetric>	scap deploying refinery host by host after Ben cleaned up the repos with "git checkout master"	[analytics]
18:11	<btullis>	btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -l stat1008.eqiad.wmnet "Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)]"	[analytics]
18:05	<btullis>	we are re-deploying refinery to an-launcher1002 with the command above	[analytics]
18:04	<btullis>	btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -l an-launcher1002.eqiad.wmnet "Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)]"	[analytics]
18:02	<btullis>	analytics-deploy@an-launcher1002:/srv/deployment/analytics/refinery$ git checkout master	[analytics]
15:59	<SandraEbele>	Deploying analytics refinery using scap.	[analytics]
2022-08-02 §
12:54	<btullis>	sudo systemctl reset-failed on stat1008 to remove failed debmonitor alerts	[analytics]
2022-07-28 §
20:05	<SandraEbele>	killing Oozie projectview-hourly and projectview-geo jobs to deploy corresponding jobs on airflow.	[analytics]
2022-07-24 §
21:10	<btullis>	swapping disks on archiva1002	[analytics]
20:36	<btullis>	rebooting archiva1002 to pick up new disk	[analytics]
15:36	<btullis>	btullis@ganeti1027:~$ sudo gnt-instance modify --disk add:size=200g archiva1002.wikimedia.org	[analytics]
2022-07-22 §
21:19	<ottomata>	restarted airflow-scheduler@platform_eng on an-airflow1003 for marco and cormac	[analytics]
2022-07-19 §
10:05	<elukey>	reboot an-worker1127 - hdfs datanode caused CPU stalls	[analytics]
2022-07-13 §
14:19	<aqu>	Deployed refinery using scap, then deployed onto hdfs (prod + test)	[analytics]
06:16	<aqu>	analytics/refinery deployment	[analytics]
2022-07-07 §
13:38	<btullis>	restart refine_eventlogging_legacy_test.service on an-test-coord1001	[analytics]
09:56	<btullis>	restarted oozie on an-test-coord1001	[analytics]
09:23	<btullis>	rebooted dbstore1007	[analytics]
09:21	<btullis>	rebooted dbstore1005	[analytics]
09:02	<btullis>	restarting dbstore1003 as per announced maintenance window	[analytics]
2022-07-06 §
18:09	<ottomata>	enabling iceberg hive catalog connector on analytics_cluster presto	[analytics]
17:57	<ottomata>	upgrading presto to 0.273.3 in analytics cluster - T311525	[analytics]
09:50	<btullis>	roll-restarting hadoop workers on the test cluster.	[analytics]
09:46	<btullis>	restarting refinery-drop-webrequest-raw-partitions.service on an-test-coord1001	[analytics]
09:44	<btullis>	restarting refinery-drop-webrequest-refined-partitions.service on an-test-coord1001	[analytics]
09:42	<btullis>	restarted drop_event.service on an-test-coord1001	[analytics]
09:35	<btullis>	restarting hive-server2 and hive-metastore on an-test-coord1001	[analytics]
2022-07-05 §
11:01	<btullis>	sudo cookbook sre.hadoop.roll-restart-masters test	[analytics]
2022-07-04 §
16:14	<btullis>	systemctl restart airflow-scheduler@research.service (on an-airflow1002)	[analytics]
08:04	<elukey>	kill leftover processes of user `mewoph` on stat100x to allow puppet runs	[analytics]
2022-06-29 §
17:27	<mforns>	killed mediawiki-history-load bundle in Hue, and started corresponding mediawiki_history_load DAG in Airflow	[analytics]
13:12	<mforns>	re-deployed refinery with scap and refinery-deploy-to-hdfs	[analytics]
11:51	<btullis>	btullis@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet	[analytics]
2022-06-28 §
20:57	<mforns>	refinery deploy failed and I rolled back successfully, will try and repeat tomorrow when other people are present :]	[analytics]
20:19	<mforns>	starting refinery deployment for refinery-source v0.2.2	[analytics]
20:19	<mforns>	starting refinery deploymenty	[analytics]
17:25	<ottomata>	installing presto 0.273.3 on an-test-coord1001 and an-test-presto1001	[analytics]
12:48	<milimetric>	deploying airflow-dags/analytics to work on the metadata ingestion jobs	[analytics]
2022-06-27 §
20:33	<btullis>	systemctl reset-failed jupyter-aarora-singleuser and jupyter-seddon-singleuser on stat1005	[analytics]
20:16	<btullis>	checking and restarting prometheus-mysqld-exporter on an-coord1001	[analytics]
15:25	<btullis>	upgraded conda-base-env on an-test-client1001 from 0.0.1 to 0.0.4	[analytics]
2022-06-24 §
15:14	<ottomata>	backfilled eventlogging data lost during failed gobblin job - T311263	[analytics]
2022-06-23 §
13:48	<btullis>	started the namenode service on an-master1001 after failback failure	[analytics]
13:41	<btullis>	The failback didn't work again.	[analytics]
13:39	<btullis>	attempting failback of namenode service from an-master1002 to an-master1001	[analytics]