7801-7850 of 10000 results (34ms)
2020-05-14 §
09:29 <elukey> upload matomo-3.13.3 to thirdparty/matomo on stretch|buster-wikimedia [production]
08:57 <elukey> imported gpg key 1FD752571FE36FF23F78F91B81E2E78B66FED89E in apt1001 (Matomo public debian repo) [production]
2020-05-13 §
21:30 <elukey> powercycle analytics1055 [production]
07:14 <elukey> upload spark2_2.4.4-bin-hadoop2.6-2 for buster/stretch on apt1001 [production]
2020-05-11 §
17:51 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:49 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
17:16 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:14 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
06:04 <elukey> restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system [production]
2020-05-10 §
08:44 <elukey> Power cycle analytics1052 after eno1 issue [production]
2020-05-07 §
09:11 <elukey> roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary) [production]
05:33 <elukey> restart hadoop yarn nodemanager on analytics1071 [production]
2020-05-06 §
06:00 <elukey> powercycle analytics1060 - host stuck - T251973 [production]
2020-05-05 §
15:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:24 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
15:03 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:00 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-05-04 §
07:07 <elukey> execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping [production]
06:41 <elukey> upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia [production]
2020-04-29 §
17:54 <elukey@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) [production]
17:44 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers [production]
08:52 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
08:45 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
2020-04-28 §
09:22 <elukey@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) [production]
09:12 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers [production]
09:12 <elukey@cumin1001> END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99) [production]
09:12 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers [production]
2020-04-27 §
13:10 <elukey> roll restart elastic on cloudelastic-chi again to pick up new JVM settings - T231517 [production]
07:25 <elukey> roll restart elastic-chi on cloudelastic100[1-4] to pick up the last JVM GC settings - T231517 [production]
07:14 <elukey> powercycle an-worker1089 - unreachable via ssh, mgmt serial available, soft cpu lock events registered in dmesg [production]
06:59 <elukey> force ifdown/ifup eno1 on analytics1052 - interface negotiated speed flapping [production]
06:30 <elukey@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet [production]
2020-04-26 §
18:08 <elukey> powercycle puppetmaster1001 - mgmt serial console not usable, no ssh, racadm getsel doesn't show anything [production]
2020-04-22 §
05:50 <elukey@deploy1001> Finished deploy [analytics/refinery@30facc4]: Test of new scap settings (duration: 04m 42s) [production]
05:45 <elukey@deploy1001> Started deploy [analytics/refinery@30facc4]: Test of new scap settings [production]
05:25 <elukey@deploy1001> deploy aborted: log (duration: 00m 02s) [production]
05:24 <elukey@deploy1001> Started deploy [analytics/refinery@30facc4]: log [production]
2020-04-20 §
10:37 <elukey> apt-get purge rsync on mwlog* after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589600/ [production]
06:41 <elukey> execute find -mtime +30 -delete in /var/log/airflow/scheduler on an-airflow1001 to free space [production]
2020-04-16 §
15:54 <elukey> restart chi on cloudelastic1001 with -XX:NewRatio=3 - T231517 [production]
11:29 <elukey> restart atskafka on cp3050 after maintenance [production]
11:17 <elukey> stop atskafka on cp3050 to re-create the topic atskafka_test_webrequest_text on Kafka Jumbo - T250347 [production]
09:33 <elukey> restart atskafka on cp3050 to pick up snappy compression - T250347 [production]
05:33 <elukey> restart hadoop-yarn-nodemanager on an-worker108[4,5] - failed after GC OOM events (heavy spark jobs) [production]
2020-04-15 §
09:08 <elukey> restart druid brokers on druid100[4-6] - stuck after datasource deletion [production]
07:35 <elukey> restart cloudelastic-chi on cloudelastic1002 to apply new jvm settings - T231517 [production]
2020-04-14 §
14:15 <elukey> enable TLS between weblog1001,mwlog2001.codfw.wmnet,mwlog1001 and Kafka Jumbo/Logging - T250147 [production]
08:49 <elukey> restart elastic-chi on cloudelastic1001 with -XX:NewSize=10G - T231517 [production]
07:33 <elukey> apply CMS GC settings to chi on cloudelastic1001 - T231517 [production]
2020-04-13 §
06:36 <elukey> temporary stopped puppet on restbase2014 to avoid attempts to start cassandra on each run - T250050 [production]