201-250 of 2899 results (16ms)
2020-05-21 §
17:24 <elukey> add druid100[7,8] to the druid public cluster (not serving load balancer traffic for the moment, only joining the cluster) - T252771 [analytics]
16:44 <elukey> roll restart druid historical nodes on druid100[4-6] (public cluster) to pick up new settings - T252771 [analytics]
14:02 <elukey> restart druid kafka supervisor for wmf_netflow after maintenance [analytics]
13:53 <elukey> restart druid-historical on an-druid100[1,2] to pick up new settings [analytics]
13:17 <elukey> kill wmf_netflow druid supervisor for maintenance [analytics]
13:13 <elukey> stop druid-daemons on druid100[1-3] (one at the time) to move the druid partition from /srv/druid to /srv (didn't think about it before) - T252771 [analytics]
09:16 <elukey> move Druid Analytics SQL in Superset to druid://an-druid1001.eqiad.wmnet:8082/druid/v2/sql/ [analytics]
09:05 <elukey> move turnilo to an-druid1001 (beefier host) [analytics]
08:15 <elukey> roll restart of all druid historicals in the analytics cluster to pick up new settings [analytics]
2020-05-20 §
13:55 <milimetric> deployed refinery with refinery-source v0.0.125 [analytics]
2020-05-19 §
15:28 <elukey> restart hadoop master daemons on an-master100[1,2] for openjdk upgrades [analytics]
06:29 <elukey> roll restart zookeeper on druid100[4-6] for openjdk upgrades [analytics]
06:18 <elukey> roll restart zookeeper on druid100[1-3] for openjdk upgrades [analytics]
2020-05-18 §
14:02 <elukey> roll restart of hadoop daemons on the prod cluster for openjdk upgrades [analytics]
13:30 <elukey> roll restart hadoop daemons on the test cluster for openjdk upgrades [analytics]
10:33 <elukey> add an-druid100[1,2] to the Druid Analytics cluster [analytics]
2020-05-15 §
13:23 <elukey> roll restart of the Druid analytics cluster to pick up new openjdk + /srv completed [analytics]
13:15 <elukey> turnilo back to druid1001 [analytics]
13:03 <elukey> move turnilo config to druid1002 to ease druid maintenance [analytics]
12:31 <elukey> move superset config to druid1002 (was druid1003) to ease maintenance [analytics]
09:08 <elukey> restart druid brokers on Analytics Public [analytics]
2020-05-14 §
18:41 <ottomata> fixed TLS authentication for Kafka mirror maker on jumbo - T250250 [analytics]
12:49 <joal> Release 2020-04 mediawiki_history_reduced to public druid for AQS (elukey did it :-P) [analytics]
09:53 <elukey> upgrade matomo to 3.13.3 [analytics]
09:50 <elukey> set matomo in maintenance mode as prep step for upgrade [analytics]
2020-05-13 §
21:36 <elukey> powercycle analytics1055 [analytics]
13:46 <elukey> upgrade spark2 on all stat100x hosts - T250161 [analytics]
06:47 <elukey> upgrade spark2 on stat1004 - canary host - T250161 [analytics]
2020-05-11 §
10:17 <elukey> re-run webrequest-load-wf-text-2020-5-11-9 [analytics]
06:06 <elukey> restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system [analytics]
05:14 <elukey> force re-run of monitor_refine_event_failure_flags after fixing a refine failed hour [analytics]
2020-05-10 §
07:44 <joal> Rerun webrequest-load-wf-upload-2020-5-10-1 [analytics]
2020-05-08 §
21:06 <ottomata> running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203 [analytics]
15:36 <ottomata> starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203 [analytics]
15:27 <ottomata> stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203 [analytics]
15:16 <ottomata> restarted turnilo after applying nuria and mforns changes [analytics]
2020-05-07 §
17:39 <ottomata> deploying fix to refinery bin/camus CamusPartitionChecker when using dynamic stream configs [analytics]
16:49 <joal> Restart and babysit mediawiki-history-denormalize-wf-2020-04 [analytics]
16:37 <elukey> roll restart of all the nodemanagers on the hadoop cluster to pick up new jvm settings [analytics]
13:53 <elukey> move stat1007 to role::statistics::explorer (adding jupyterhub) [analytics]
11:00 <joal> Moving application_1583418280867_334532 to the nice queue [analytics]
10:58 <joal> Rerun wikidata-articleplaceholder_metrics-wf-2020-5-6 [analytics]
07:45 <elukey> re-run mediawiki-history-denormalize [analytics]
07:43 <elukey> kill application_1583418280867_333560 after a chat with David, the job is consuming ~2TB of RAM [analytics]
07:32 <elukey> re-run mediawiki history load [analytics]
07:18 <elukey> execute yarn application -movetoqueue application_1583418280867_332862 -queue root.nice [analytics]
07:06 <elukey> restart mediawiki-history-load via hue [analytics]
06:41 <elukey> restart oozie on an-coord1001 [analytics]
05:46 <elukey> re-run mediarequest-hourly-wf-2020-5-6-19 [analytics]
05:35 <elukey> re-run two failed hours for webrequest load text (07/05T05) and upload (06/05T23) [analytics]