analytics SAL

851-900 of 4596 results (23ms)

2021-04-21 §
08:31	<elukey>	re-enable timers on an-launcher1002 and airflow on an-airflow1001 after maintenance on an-coord1001	[analytics]
07:08	<elukey>	reimage an-coord1001 after partition reshape (/var/lib/mysql folded in /srv)	[analytics]
06:51	<elukey>	stop airflow on an-airflow1001	[analytics]
06:49	<elukey>	stop all services on an-coord1001 as prep step for reimage	[analytics]
06:45	<elukey>	PURGE BINARY LOGS BEFORE '2021-04-14 00:00:00'; on an-coord1001 to free some space before the reimage	[analytics]
06:00	<elukey>	stop timers on an-launcher1002 as prep step for an-coord1001 reimage	[analytics]
2021-04-20 §
15:51	<elukey>	move analytics-hive.eqiad.wmnet back to an-coord1001 (test on an-coord1002 successful)	[analytics]
15:38	<ottomata>	deployed refiner to hdfs	[analytics]
13:59	<ottomata>	deploying refinery and refinery source 0.1.6 for weekly train	[analytics]
13:37	<ottomata>	deployed aqs	[analytics]
13:16	<elukey>	failover analytics-hive to an-coord1002 to test the host (running on buster)	[analytics]
12:40	<elukey>	PURGE BINARY LOGS BEFORE '2021-04-12 00:00:00'; on an-coord1001 - T280367	[analytics]
2021-04-19 §
16:45	<ottomata>	make RefineMonitor use analytics keytab - this should be a no-op	[analytics]
16:07	<razzi>	run kafka preferred-replica-election on jumbo cluster (kafka-jumbo1002)	[analytics]
06:50	<elukey>	move /var/lib/hadoop/name partition under /srv/hadoop/name on an-master1001 - T265126	[analytics]
05:45	<elukey>	cleanup Lex's jupyter notebooks on stat1007 to allow puppet to clean up	[analytics]
2021-04-18 §
07:25	<elukey>	run "PURGE BINARY LOGS BEFORE '2021-04-11 00:00:00';" on an-coord1001 to free some space - T280367	[analytics]
2021-04-16 §
15:14	<elukey>	execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00'; on an-coord1001 to free space for /var/lib/mysql - T280367	[analytics]
15:13	<elukey>	execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00';	[analytics]
07:54	<elukey>	drop all the cloudera packages from our repositories	[analytics]
2021-04-15 §
21:13	<razzi>	rebalance kafka partitions for webrequest_text partition 23	[analytics]
14:56	<elukey>	deploy refinery via scap - weekly train	[analytics]
09:50	<elukey>	rollback hue on an-tool1009 to 4.8, it seems that 4.9 still has issues	[analytics]
06:32	<elukey>	move hue.wikimedia.org to an-tool1009 (from analytics-tool1001)	[analytics]
01:36	<razzi>	rebalance kafka partitions for webrequest_text partitions 21,22	[analytics]
2021-04-14 §
14:05	<elukey>	run build/env/bin/hue migrate on an-tool1009 after the hue upgade	[analytics]
13:10	<elukey>	rollback hue-next to 4.8 - issues not present in staging	[analytics]
13:00	<elukey>	upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org	[analytics]
10:02	<elukey>	roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart)	[analytics]
09:54	<elukey>	kill long running mediawiki-job refine erroring out application_1615988861843_166906	[analytics]
09:46	<elukey>	kill application_1615988861843_163186 for the same reason	[analytics]
09:43	<elukey>	kill application_1615988861843_164387 to see if any improvement to socket consumption is made	[analytics]
09:14	<elukey>	run "sudo kill `pgrep -f sqoop`" on an-launcher1002 to clean up old test processes still running	[analytics]
2021-04-13 §
16:17	<razzi>	rebalance kafka partitions for webrequest_text partitions 19, 20	[analytics]
13:18	<ottomata>	Refine now uses refinery-job 0.1.4; RefineFailuresChecker has been removed and its function rolled into RefineMonitor -	[analytics]
10:23	<hnowlan>	deploying aqs with updated cassandra libraries to aqs1004 while depooled	[analytics]
06:17	<elukey>	kill application application_1615988861843_158645 to free space on analytics1070	[analytics]
06:10	<elukey>	kill application_1615988861843_158592 on analytics1061 to allow space to recover (truncate of course in D state)	[analytics]
06:05	<elukey>	truncate logs for application_1615988861843_158592 on analytics1061 - one partition full	[analytics]
2021-04-12 §
14:21	<ottomata>	stop using http proxies for produce_canary_events_job - T274951	[analytics]
2021-04-08 §
16:33	<elukey>	reboot an-worker1100 again to check if all the disks come up correctly	[analytics]
15:43	<razzi>	rebalance kafka partitions for webrequest_text partitions 17, 18	[analytics]
15:35	<elukey>	reboot an-worker1100 to see if it helps with the strange BBU behavior in T279475	[analytics]
14:07	<elukey>	drop /var/spool/rsyslog from stat1008 - corrupted files due to root partition filled up caused a SEGV for rsyslog	[analytics]
11:14	<hnowlan>	created aqs user and loaded full schemas into analytics wmcs cassandra	[analytics]
08:35	<elukey>	apt-get clean on stat1008 to free some space	[analytics]
07:44	<elukey>	restart hadoop hdfs masters on an-master100[1,2] to apply the new log4j settings fro the audit log	[analytics]
06:44	<elukey>	re-deployed refinery to hadoop-test after fixing permissions on an-test-coord1001	[analytics]
2021-04-07 §
23:03	<ottomata>	installing anaconda-wmf-2020.02~wmf5 on remaining nodes - T279480	[analytics]
22:51	<ottomata>	installing anaconda-wmf-2020.02~wmf5 on stat boxes - T279480	[analytics]