analytics SAL

501-550 of 5886 results (28ms)

2023-07-19 §
10:15	<btullis>	restarting oozie service on an-coord1001 for T329716	[analytics]
10:14	<btullis>	restarting presto-service on an-coord1001 for T329716	[analytics]
10:06	<btullis>	restarting java services on an-test-coord1001 for JVM update	[analytics]
09:13	<btullis>	correction: to an-test-client1002	[analytics]
09:13	<btullis>	deploying airflow-dags for analytics_test to an-test-client1001	[analytics]
2023-07-18 §
13:20	<stevemunene>	deploy airflow-dags to an-test-client1002 T341700	[analytics]
2023-07-17 §
13:34	<elukey>	`kill `pgrep -u appledora`` and `kill `pgrep -u akhatun`` on stat1008 to unblock puppet (offboarded users deletion)	[analytics]
13:32	<btullis>	proceeding to reimage analytics1072 (journalnode, in addition to datanode)	[analytics]
09:31	<btullis>	restarted airflow services on an-test-client1002 in order to pick up new versions	[analytics]
09:19	<btullis>	upgrading airflow on an-test-client1002 to version 2.6.3	[analytics]
2023-07-13 §
20:38	<xcollazo>	deployed Airflow DAGs for analytics instance to pickup T335860	[analytics]
2023-07-12 §
16:26	<btullis>	`sudo cumin A:wikireplicas-all 'maintain-views --replace-all --all-databases --table revision'` for T339037	[analytics]
14:11	<btullis>	roll-restarting zookeeper on druid-public for new JVM version	[analytics]
2023-07-11 §
11:00	<btullis>	Proceeding to upgrade datahub in production	[analytics]
08:59	<btullis>	rebooting kafkamon1003	[analytics]
08:54	<btullis>	`systemctl start burrow-jumbo-eqiad.service` on kafkamon1003 for T341551	[analytics]
2023-07-10 §
14:04	<btullis>	powered on an-worker1145	[analytics]
14:02	<btullis>	powered off an-worker1145 for T341481	[analytics]
10:55	<btullis>	`sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001	[analytics]
2023-07-07 §
09:56	<btullis>	`sudo systemctl start hadoop-hdfs-namenode.service ` on an-master1001	[analytics]
09:28	<stevemunene>	running sre.hadoop.roll-restart-masters restart the maters to completely remove any reference of analytics[1058-1069] T317861	[analytics]
09:15	<stevemunene>	run puppet on hadoop masters to pick up changes from recently decommissioned hosts	[analytics]
08:12	<elukey>	wipe kafka-test cluster (data + zookeper config) to start clean after the issue happened yesterday	[analytics]
2023-07-06 §
14:51	<elukey>	upgraded zookeeper-test1002 to bookworm, but its metadata got re-initialized as well (my bad for this)	[analytics]
14:30	<stevemunene>	decommission analytics1069.eqiad.wmnet T341209	[analytics]
14:19	<stevemunene>	decommission analytics1068.eqiad.wmnet T341208	[analytics]
14:06	<stevemunene>	decommission analytics1067.eqiad.wmnet T341207	[analytics]
13:13	<stevemunene>	decommission analytics1066.eqiad.wmnet T341206	[analytics]
13:02	<stevemunene>	decommission analytics1065.eqiad.wmnet T341205	[analytics]
12:35	<stevemunene>	decommission analytics1064.eqiad.wmnet T341204	[analytics]
11:18	<stevemunene>	decommission analytics1063.eqiad.wmnet T339201	[analytics]
10:40	<stevemunene>	decommission analytics1062.eqiad.wmnet T339200	[analytics]
09:57	<stevemunene>	decommission analytics1061.eqiad.wmnet T339199	[analytics]
07:23	<stevemunene>	run puppet agent on hadoop masters	[analytics]
07:21	<stevemunene>	Remove analytics1064_1069 from hdfs net_topology	[analytics]
07:17	<stevemunene>	stop hadoop-hdfs-datanode service on analytics[1061-1069] Preparing to decommission the hosts - T317861	[analytics]
07:11	<stevemunene>	disable-puppet on analytics[1061-1069] Preparing to decommission the hosts - T317861	[analytics]
2023-07-05 §
14:36	<stevemunene>	enable puppet on analytics1069 to get the host back into puppetdb and hence allow the the decommission cookbook run later	[analytics]
11:47	<btullis>	restarted archiva for T329716	[analytics]
11:45	<btullis>	restarted hive-servers2 and hive-metastore service on an-coord1002	[analytics]
11:40	<btullis>	roll-restarting kafka-jumbo brokers for T329716	[analytics]
11:01	<btullis>	roll-restarting the presto workers for T329716	[analytics]
10:20	<btullis>	deploying updated spark3 defaults to disable the `spark.shuffle.useOldFetchProtocol`option for T332765	[analytics]
09:45	<btullis>	failing back namenode to an-master1001 with `sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet` on an-master1001	[analytics]
09:38	<btullis>	re-enabled gobblin jobs on an-launcher1002	[analytics]
09:03	<btullis>	switching yarn shuffler - running puppet on 87 worker nodes	[analytics]
08:44	<btullis>	disabled gobblin and spark jobs on an-launcher for T332765	[analytics]
08:33	<btullis>	disabled gobblin jobs with https://gerrit.wikimedia.org/r/c/operations/puppet/+/935425	[analytics]
08:27	<btullis>	roll-restarting hadoop workers in the test cluster	[analytics]
2023-07-04 §
13:55	<btullis>	roll-restarting the eventgate-analytics-external worker pods in eqiad with: `helmfile -e eqiad --state-values-set roll_restart=1 sync`	[analytics]