1-50 of 3764 results (7ms)
2021-04-27 §
08:33 <elukey> run mysql_upgrade for analytics-meta on an-coord1002 (should be part of the upgrade process) - T278424 [analytics]
07:11 <elukey> restart yarn resource managers to pick up yarn label settings [analytics]
2021-04-26 §
08:01 <elukey> restart hadoop-mapreduce-historyserver on an-master1001 after changes to the yarn ui user [analytics]
07:36 <elukey> re-enable timers after setting the capacity scheduler [analytics]
07:31 <elukey> restart hadoop RM on an-master* to pick up capacity scheduler changes [analytics]
06:44 <elukey> stop timers on an-launcher1002 again as prep step for capacity scheduler changes [analytics]
06:32 <elukey> roll restart of hadoop-yarn-nodemanagers to pick up new log4j settings - T276906 [analytics]
06:25 <elukey> re-enable timers [analytics]
06:20 <elukey> reboot an-coord1001 to pick up kernel security settings [analytics]
05:57 <elukey> stop timers on an-launcher1002 to allow a reboot of an-coord1001 [analytics]
2021-04-24 §
08:03 <joal> Rerun failed webrequest-druid-hourly-wf-2021-4-23-13 [analytics]
2021-04-23 §
14:23 <elukey> roll restart an-master100[1,2] daemons to pick up new lo4j settings - T276906 [analytics]
10:30 <elukey> restart hadoop daemons (NM, DN, JN) on an-worker1080 to further test the new log4j config - T276906 [analytics]
09:12 <elukey> change default log4j hadoop config to include rolling gzip appender [analytics]
2021-04-21 §
21:30 <ottomata> temporariliy disabling sanitize_eventlogging_analytics_delayed jobs until T280813 is completed (probably tomorrow) [analytics]
20:04 <ottomata> renaming event_santized hive table directories to lower case and repairing table partition paths - T280813 [analytics]
09:28 <elukey> roll restart druid-overlord on druid* after an-coord1001 maintenance [analytics]
09:08 <elukey> upgrade hue on an-tool1009 to 4.9.0-2 [analytics]
08:31 <elukey> re-enable timers on an-launcher1002 and airflow on an-airflow1001 after maintenance on an-coord1001 [analytics]
07:08 <elukey> reimage an-coord1001 after partition reshape (/var/lib/mysql folded in /srv) [analytics]
06:51 <elukey> stop airflow on an-airflow1001 [analytics]
06:49 <elukey> stop all services on an-coord1001 as prep step for reimage [analytics]
06:45 <elukey> PURGE BINARY LOGS BEFORE '2021-04-14 00:00:00'; on an-coord1001 to free some space before the reimage [analytics]
06:00 <elukey> stop timers on an-launcher1002 as prep step for an-coord1001 reimage [analytics]
2021-04-20 §
15:51 <elukey> move analytics-hive.eqiad.wmnet back to an-coord1001 (test on an-coord1002 successful) [analytics]
15:38 <ottomata> deployed refiner to hdfs [analytics]
13:59 <ottomata> deploying refinery and refinery source 0.1.6 for weekly train [analytics]
13:37 <ottomata> deployed aqs [analytics]
13:16 <elukey> failover analytics-hive to an-coord1002 to test the host (running on buster) [analytics]
12:40 <elukey> PURGE BINARY LOGS BEFORE '2021-04-12 00:00:00'; on an-coord1001 - T280367 [analytics]
2021-04-19 §
16:45 <ottomata> make RefineMonitor use analytics keytab - this should be a no-op [analytics]
16:07 <razzi> run kafka preferred-replica-election on jumbo cluster (kafka-jumbo1002) [analytics]
06:50 <elukey> move /var/lib/hadoop/name partition under /srv/hadoop/name on an-master1001 - T265126 [analytics]
05:45 <elukey> cleanup Lex's jupyter notebooks on stat1007 to allow puppet to clean up [analytics]
2021-04-18 §
07:25 <elukey> run "PURGE BINARY LOGS BEFORE '2021-04-11 00:00:00';" on an-coord1001 to free some space - T280367 [analytics]
2021-04-16 §
15:14 <elukey> execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00'; on an-coord1001 to free space for /var/lib/mysql - T280367 [analytics]
15:13 <elukey> execute PURGE BINARY LOGS BEFORE '2021-04-09 00:00:00'; [analytics]
07:54 <elukey> drop all the cloudera packages from our repositories [analytics]
2021-04-15 §
21:13 <razzi> rebalance kafka partitions for webrequest_text partition 23 [analytics]
14:56 <elukey> deploy refinery via scap - weekly train [analytics]
09:50 <elukey> rollback hue on an-tool1009 to 4.8, it seems that 4.9 still has issues [analytics]
06:32 <elukey> move hue.wikimedia.org to an-tool1009 (from analytics-tool1001) [analytics]
01:36 <razzi> rebalance kafka partitions for webrequest_text partitions 21,22 [analytics]
2021-04-14 §
14:05 <elukey> run build/env/bin/hue migrate on an-tool1009 after the hue upgade [analytics]
13:10 <elukey> rollback hue-next to 4.8 - issues not present in staging [analytics]
13:00 <elukey> upgrade Hue to 4.9 on an-tool1009 - hue-next.wikimedia.org [analytics]
10:02 <elukey> roll restart yarn nodemanagers on hadoop prod (attempt to see if they entered in a weird state, graceful restart) [analytics]
09:54 <elukey> kill long running mediawiki-job refine erroring out application_1615988861843_166906 [analytics]
09:46 <elukey> kill application_1615988861843_163186 for the same reason [analytics]
09:43 <elukey> kill application_1615988861843_164387 to see if any improvement to socket consumption is made [analytics]