251-300 of 3511 results (22ms)
2021-01-03 §
14:11 <milimetric> reset-failed refinery-sqoop-whole-mediawiki.service [analytics]
14:10 <milimetric> manual sqoop finished, logs on an-launcher1002 at /var/log/refinery/sqoop-mediawiki.log and /var/log/refinery/sqoop-mediawiki-production.log [analytics]
2021-01-01 §
14:54 <milimetric> deployed refinery hotfix for sqoop problem, after testing on three small wikis [analytics]
2020-12-29 §
09:18 <elukey> restart hue to pick up analytics-hive endpoint settings [analytics]
2020-12-23 §
15:53 <ottomata> point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768 [analytics]
2020-12-22 §
19:35 <elukey> restart hive daemons on an-coord1001 to pick up new settings [analytics]
18:13 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) [analytics]
18:07 <elukey> restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) [analytics]
17:00 <mforns> Deployed refinery as part of weekly train (v0.0.142) [analytics]
16:42 <mforns> Deployed refinery-source v0.0.142 [analytics]
16:30 <mforns> Deployed refinery-source v0.0.142 [analytics]
15:00 <razzi> stopping superset server on analytics-tool1004 [analytics]
10:36 <elukey> restart presto coordinator to pick up analytics-hive settings [analytics]
10:25 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1001 [analytics]
09:56 <elukey> restart hive daemons on an-coord1001 to pick up analytics-hive settings [analytics]
07:27 <elukey> reboot stat100[4-8] (analytics hadoop clients) for kernel upgrades [analytics]
07:23 <elukey> move all analytics clients (spark refine, stat100x, hive-site.xml on hdfs, etc..) to analytics-hive.eqiad.wmnet [analytics]
2020-12-18 §
14:10 <elukey> restore stat1004 to its previous settings for kerberos credential cache [analytics]
2020-12-17 §
14:54 <klausman> Updated all stat100x machines to now sport kafkacat 1.6.0, backported from Bullseye [analytics]
11:04 <elukey> wipe/reimage the hadoop test cluster to start clean for CDH (and then test the upgrade to bigtop 1.5) [analytics]
2020-12-16 §
21:06 <joal> Kill-restart virtualpageview-hourly-coord and projectview-geo-coord with manually updated jar versions (old versions in conf) [analytics]
19:35 <joal> Kill-restart all oozie jobs belonging to analytics except mediawiki-wikitext-history-coord [analytics]
18:52 <joal> Kill-restart cassandra loading oozie jobs [analytics]
18:37 <joal> Kill-restart wikidata-entity, wikidata-item_page_link and mobile_apps-session_metrics oozie jobs [analytics]
18:31 <joal> Kill-rerun data-quality bundles [analytics]
16:17 <razzi> dropping and re-creating superset staging database [analytics]
08:13 <joal> Manually push updated pageview whitelist to HDFS [analytics]
2020-12-15 §
20:24 <joal> Kill restart webrequest_load oozie job after deploy [analytics]
19:43 <joal> Deploy refinery onto HDFS [analytics]
19:14 <joal> Scap deploy refinery [analytics]
18:26 <joal> Release refinery-source v0.0.141 [analytics]
2020-12-14 §
19:09 <razzi> restart restart hadoop-yarn-resourcemanager on an-master1002 to promote an-master1001 to active again [analytics]
19:08 <razzi> restarted hadoop-yarn-resourcemanager on an-master1001 again by mistake [analytics]
19:02 <razzi> restart hadoop-yarn-resourcemanager on an-master1002 [analytics]
18:54 <razzi> restart hadoop-yarn-resourcemanager on an-master1001 [analytics]
18:43 <razzi> applying yarn config change via `sudo cumin "A:hadoop-worker" "systemctl restart hadoop-yarn-nodemanager" -b 10` [analytics]
14:58 <elukey> stat1004's krb credential cache moved under /run (shared between notebooks and ssh/bash) - T255262 [analytics]
07:55 <elukey> roll restart yarn daemons to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/649126 [analytics]
2020-12-11 §
19:30 <ottomata> now ingesting Growth EventLogging schemas using event platform refine job; they are exclude-listed from eventlogging-processor. - T267333 [analytics]
07:04 <elukey> roll restart presto cluster to pick up new jvm xmx settings [analytics]
06:57 <elukey> restart presto on an-presto1003 since all the memory on the host was occupied, and puppet failed to run [analytics]
2020-12-10 §
12:29 <joal> Drop-Recreate-Repair wmf_raw.mediawiki_image table [analytics]
2020-12-09 §
20:34 <elukey> execute on mysql:an-coord1002 "set GLOBAL replicate_wild_ignore_table='superset_staging.%'" to avoid replication for superset_staging from an-coord1002 [analytics]
07:12 <elukey> re-enable timers after maintenance [analytics]
07:07 <elukey> restart hive-server2 on an-coord1002 for consistency [analytics]
07:05 <elukey> restart hive metastore and server2 on an-coord1001 to pick up settings for DBTokenStore [analytics]
06:50 <elukey> stop timers on an-launcher1002 as prep step to restart hive [analytics]
2020-12-07 §
18:51 <joal> Test mediawiki-wikitext-history new sizing settings [analytics]
18:43 <razzi> kill testing flink job: sudo -u hdfs yarn application -kill application_1605880843685_61049 [analytics]
18:42 <razzi> truncate /var/lib/hadoop/data/h/yarn/logs/application_1605880843685_61049/container_e27_1605880843685_61049_01_000002/taskmanager.log on an-worker1011 [analytics]