251-300 of 2851 results (12ms)
2020-04-26 §
18:14 <elukey> restart nodemanager on analytics1054 - failed due to heap pressure [analytics]
18:14 <elukey> re-run webrequest-load-coord-text 26/04/2020T16 via Hue [analytics]
2020-04-23 §
13:57 <elukey> launch again data quality stats bundle with https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/592008/ applied locally [analytics]
2020-04-22 §
06:46 <elukey> kill dataquality hourly bundle again, traffic_by_country keeps failing [analytics]
06:11 <elukey> start data quality bundle hourly with --user=analytics [analytics]
05:45 <elukey> add a separate refinery scap target for the Hadoop test cluster and redeploy to check new settings [analytics]
2020-04-21 §
23:17 <milimetric> restarted webrequest bundle, babysitting that first before going on [analytics]
23:00 <milimetric> forgot a small jar version update, finished deploying now [analytics]
21:38 <milimetric> deployed twice because analytics1030 failed with "OSError {}" but seems ok after the second deploy [analytics]
14:27 <elukey> add motd to notebook100[3,4] to alert about host deprecation (in favor of stat100x) [analytics]
11:51 <elukey> manually add SUCCESS flags under /wmf/data/wmf/banner_activity/daily/year=2020/month=1 and /wmf/data/wmf/banner_activity/daily/year=2019/month=12 to unblock druid banner monthly indexations [analytics]
2020-04-20 §
14:38 <ottomata> restarting eventlogging-processor with updated python3-ua-parser for parsing KaiOS user ageints [analytics]
10:28 <elukey> drop /srv/log/mw-log/archive/api from stat1007 (freeing 1.3TB of space!) [analytics]
2020-04-18 §
21:40 <elukey> force hdfs-balancer as attempt to redistribute hdfs blocks more evenly to worker nodes (hoping to free the busiest ones) [analytics]
21:32 <elukey> drop /user/analytics-privatedata/.Trash/* from hdfs to free some space (~100G used) [analytics]
21:25 <elukey> drop /var/log/hadoop-yarn/apps/analytics-search/* from hdfs to free space (~8T replicated used) [analytics]
21:21 <elukey> drop /user/{analytics|hdfs}/.Trash/* from hdfs to free space (~100T used) [analytics]
21:12 <elukey> drop /var/log/hadoop-yarn/apps/analytics from hdfs to free space (15.1T replicated) [analytics]
2020-04-17 §
13:45 <elukey> lock down /srv/log/mw-log/archive/ on stat1007 to analytics-privatedata-users access only [analytics]
10:26 <elukey> re-created default venv for notebooks on notebook100[3,4] (missed to git pull before re-creaing it the last time) [analytics]
2020-04-16 §
05:34 <elukey> restart hadoop-yarn-nodemanager on an-worker108[4,5] - failed after GC OOM events (heavy spark jobs) [analytics]
2020-04-15 §
14:03 <elukey> update Superset Alpha role perms with what stated in T249923#6058862 [analytics]
09:35 <elukey> restart jupyterhub too as follow up [analytics]
09:35 <elukey> execute "create_virtualenv.sh ../venv" on stat1006, notebook1003, notebook1004 to apply new settings to Spark kernels (re-creating them) [analytics]
09:09 <elukey> restart druid brokers on druid100[4-6] - stuck after datasource deletion [analytics]
2020-04-11 §
09:19 <elukey> set hive-security: read-only for the Presto hive connector and roll restart the cluster [analytics]
2020-04-10 §
16:31 <elukey> enable TLS from kafkatee to Kafka on analytics1030 (test instance) [analytics]
15:45 <elukey> migrate data_purge timers from an-coord1001 to an-launcher1001 [analytics]
09:11 <elukey> move druid_load jobs from an-coord1001 to an-launcher1001 [analytics]
08:08 <elukey> move project_namespace_map from an-coord1001 to an-launcher1001 [analytics]
07:38 <elukey> move hdfs-cleaner from an-coord1001 to an-launcher1001 [analytics]
2020-04-09 §
20:54 <elukey> re-run webrequest upload/text hour 15:00 from Hue (stuck due to missing _IMPORTED flag, caused by an-launcher1001 migration. Andrew fixed it re-running manually the Camus checker) [analytics]
16:00 <elukey> move camus timers from an-coord1001 to an-launcher1001 [analytics]
15:20 <elukey> absent spark refine timers on an-coord1001 and move them to an-launcher1001 [analytics]
2020-04-07 §
09:17 <elukey> enable refine for TwoColConflictExit (EL schema) [analytics]
2020-04-06 §
13:23 <elukey> upgraded stat1008 to AMD ROCm 3.3 (enables tensorflow 2.x) [analytics]
12:33 <joal> Bump AQS druid backend to 2020-03 [analytics]
11:50 <elukey> deploy new druid datasource in Druid public [analytics]
06:29 <elukey> allow all analytics-privatedata-users to use the GPUs on stat1005/8 [analytics]
2020-04-04 §
06:52 <elukey> restart refinery-import-page-history-dumps [analytics]
2020-04-03 §
09:57 <elukey> remove TwoColConflictExit from eventlogging's refine blacklist [analytics]
2020-04-02 §
19:31 <joal> restart paegviewhourly job after manual patch [analytics]
19:29 <joal> Manually patching last deploy to fic virtualpageview job - code merged [analytics]
17:48 <joal> Kill/restart virtualpageview-hourly-coord after deploy [analytics]
16:55 <joal> Deploy refinery onto HDFS [analytics]
16:30 <joal> Deploy refinery using scap [analytics]
16:12 <elukey> re-enable timers on an-coord1001 after maintenance [analytics]
15:52 <elukey> restart hive server2/metastore with G1 settings [analytics]
14:05 <elukey> temporary stop timers on an-coord1001 to facilitate hive daemons restarts [analytics]
13:47 <hashar> test 1 2 3 [analytics]