2101-2150 of 5329 results (27ms)
2020-12-14 §
19:08 <razzi> restarted hadoop-yarn-resourcemanager on an-master1001 again by mistake [analytics]
19:02 <razzi> restart hadoop-yarn-resourcemanager on an-master1002 [analytics]
18:54 <razzi> restart hadoop-yarn-resourcemanager on an-master1001 [analytics]
18:43 <razzi> applying yarn config change via `sudo cumin "A:hadoop-worker" "systemctl restart hadoop-yarn-nodemanager" -b 10` [analytics]
14:58 <elukey> stat1004's krb credential cache moved under /run (shared between notebooks and ssh/bash) - T255262 [analytics]
07:55 <elukey> roll restart yarn daemons to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/649126 [analytics]
2020-12-11 §
19:30 <ottomata> now ingesting Growth EventLogging schemas using event platform refine job; they are exclude-listed from eventlogging-processor. - T267333 [analytics]
07:04 <elukey> roll restart presto cluster to pick up new jvm xmx settings [analytics]
06:57 <elukey> restart presto on an-presto1003 since all the memory on the host was occupied, and puppet failed to run [analytics]
2020-12-10 §
12:29 <joal> Drop-Recreate-Repair wmf_raw.mediawiki_image table [analytics]
2020-12-09 §
20:34 <elukey> execute on mysql:an-coord1002 "set GLOBAL replicate_wild_ignore_table='superset_staging.%'" to avoid replication for superset_staging from an-coord1002 [analytics]
07:12 <elukey> re-enable timers after maintenance [analytics]
07:07 <elukey> restart hive-server2 on an-coord1002 for consistency [analytics]
07:05 <elukey> restart hive metastore and server2 on an-coord1001 to pick up settings for DBTokenStore [analytics]
06:50 <elukey> stop timers on an-launcher1002 as prep step to restart hive [analytics]
2020-12-07 §
18:51 <joal> Test mediawiki-wikitext-history new sizing settings [analytics]
18:43 <razzi> kill testing flink job: sudo -u hdfs yarn application -kill application_1605880843685_61049 [analytics]
18:42 <razzi> truncate /var/lib/hadoop/data/h/yarn/logs/application_1605880843685_61049/container_e27_1605880843685_61049_01_000002/taskmanager.log on an-worker1011 [analytics]
2020-12-03 §
22:34 <milimetric> updated mw history snapshot on AQS [analytics]
07:09 <elukey> manual reset-failed refinery-sqoop-whole-mediawiki.service on an-launcher1002 (job launched manually) [analytics]
2020-12-02 §
21:37 <joal> Manually create _SUCCESS flags for banner history monthly jobs to kick off (they'll be deleted by the purge tomorrow morning) [analytics]
21:16 <joal> Rerun timed out jobs after oozie config got updated (mediawiki-geoeditors-yearly-coord and banner_activity-druid-monthly-coord) [analytics]
20:49 <ottomata> deployed eventgate-analytics-external with refactored stream config, hopefully this will work around the canary events alarm bug - T266573 [analytics]
18:20 <mforns> finished netflow migration wmf->event [analytics]
17:50 <mforns> starting netflow migration wmf->event [analytics]
17:50 <joal> Manually start refinery-sqoop-production on an-launcher1002 to cover for couped runs failure [analytics]
16:50 <mforns> restarted turnilo to clear deleted datasource [analytics]
16:47 <milimetric> faked _SUCCESS flag for image table to allow daisy-chained mediawiki history load dependent coordinators to keep running [analytics]
07:49 <elukey> restart oozie to pick up new settings for T264358 [analytics]
2020-12-01 §
19:43 <razzi> deploy refinery with refinery-source v0.0.140 [analytics]
10:50 <elukey> restart oozie to pick up new logging settings [analytics]
09:03 <elukey> clean up old hive metastore/server old logs on an-coord1001 to free space [analytics]
2020-11-30 §
17:51 <joal> Deploy refinery onto hdfs [analytics]
17:49 <joal> Kill-restart mediawiki-history-load job after refactor (1 coordinator per table) and tables addition [analytics]
17:32 <joal> Kill-restart mediawiki-history-reduced job for druid-public datasource number of shards update [analytics]
17:32 <joal> Deploy refinery using scap for naming hotfix [analytics]
15:29 <ottomata> migrated EventLogging schemas SpecialMuteSubmit and SpecialInvestigate to EventGate - T268517 [analytics]
14:56 <joal> Deploying refinery onto hdfs [analytics]
14:49 <joal> Create new hive tables for newly sqooped data [analytics]
14:45 <joal> Deploy refinery using scap [analytics]
09:08 <elukey> force execution of refinery-drop-pageview-actor-hourly-partitions on an-launcher1002 (after args fixup from Joseph) [analytics]
2020-11-27 §
14:51 <elukey> roll restart zookeeper on druid* nodes for openjdk upgrades [analytics]
10:29 <elukey> restart eventlogging_to_druid_editattemptstep_hourly on an-launcher1002 (failed) to see if the hive metastore works [analytics]
10:27 <elukey> restart oozie and presto-server on an-coord1001 for openjdk upgrades [analytics]
10:27 <elukey> restart hive server and metastore on an-coord1001 - openjdk upgrades + problem with high GC caused by a job [analytics]
08:05 <elukey> roll restart druid public cluster for openjdk upgrades [analytics]
2020-11-26 §
13:52 <elukey> roll restart druid daemons on druid analytics to pick up new openjdk upgrades [analytics]
13:08 <elukey> force umount/mount of all /mnt/hdfs mountpoints to pick up opendjdk upgrades [analytics]
09:07 <elukey> force purging https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Diego_Maradona/daily/2020110500/2020112500 from caches [analytics]
08:40 <elukey> roll restart cassandra on aqs10* for openjdk upgrades [analytics]