7901-7950 of 10000 results (28ms)
2020-03-17 §
09:27 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
09:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [production]
08:39 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
2020-03-16 §
10:36 <elukey> roll restart of recommendation service on scb* as attempt to fix the flapping alerts - T247732 [production]
2020-03-14 §
08:33 <elukey> run kafka preferred-replica-election on kafka-jumbo1001 - T247561 [production]
08:32 <elukey> run systemctl restart systemd-timedated.service on stat1008 [production]
2020-03-13 §
17:20 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) [production]
17:08 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-mirror-maker [production]
16:02 <elukey> powercycle kafka-jumbo1006 after switch port changed - T247561 [production]
2020-03-12 §
17:48 <elukey> increase via 'kadmin.local modprinc -maxlife 2d $user' all max ticket lifetimes of Kerberos User principals on the krb1001's KDC (changes will be propagated to codfw automatically) [production]
17:17 <elukey> execute modprinc -maxlife 2d krbtgt/WIKIMEDIA via kadmin.local on krb1001 (will be propagated to 2001 automatically) [production]
14:51 <elukey> restart kpropd daemon on krb2001 [production]
11:09 <elukey> roll restart of krb-kdc on krb1001/krb2001 to pick up new ticket lifetime settings (10h -> 48h) [production]
08:12 <elukey> push new install/webproxy terms for analytics-in4/6 to cr1/cr2-eqiad [production]
2020-03-11 §
18:36 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
18:33 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-03-08 §
17:58 <elukey> restart hadoop-yarn-nodemanger on an-worker1087 [production]
2020-03-06 §
14:53 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
14:50 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
10:16 <elukey@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) [production]
10:06 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers [production]
09:46 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
09:43 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
2020-03-05 §
17:32 <elukey> run homer on cumin1001 to apply https://gerrit.wikimedia.org/r/576873 on cr1/cr2-eqiad [production]
16:01 <elukey> depool mw1394 [production]
06:48 <elukey> restart yarn on analytics1074 (GC overhead, traces of network errors with datanodes) [production]
2020-03-03 §
15:15 <elukey@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0) [production]
14:12 <elukey@cumin1001> START - Cookbook sre.elasticsearch.rolling-restart [production]
11:01 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) [production]
10:49 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-mirror-maker [production]
10:47 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) [production]
08:36 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers [production]
08:11 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
08:05 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
07:55 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
07:48 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
07:39 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
07:09 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
07:07 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
06:41 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
2020-03-02 §
13:18 <elukey> roll restart Hadoop master daemons on an-master100[1,2] for openjdk upgrades [production]
09:50 <elukey> powercycle an-worker1083 (no ssh, mgmt console available but tty not really usable, CPU soft lockups reported) [production]
2020-02-28 §
15:36 <elukey@deploy1001> Finished deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2 (duration: 13m 40s) [production]
15:22 <elukey@deploy1001> Started deploy [analytics/refinery@28fa2fc]: fix for refinery-drop-older-than - part 2 [production]
14:15 <elukey@deploy1001> Finished deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script (duration: 14m 01s) [production]
14:01 <elukey@deploy1001> Started deploy [analytics/refinery@2db36f4]: Fix refinery-drop-older-than script [production]
2020-02-27 §
18:51 <elukey> upgrade prometheus-mcrouter-exporter to 0.1.0+git20200227-1 on hosts [production]
18:20 <elukey> upload prometheus-mcrouter-exporter 0.1.0+git20200227-1 to stretch-wikimedia [production]
2020-02-26 §
14:00 <elukey> run apt-get clean on notebook1004 to free some space - T224682 [production]
10:57 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [production]