9151-9200 of 10000 results (23ms)
2017-12-18 §
14:13 <elukey> temporarily stopped mysql consumers on eventlog1001 to ease a mysql backup on db1107 - T183123 [production]
08:57 <elukey> rolling restart of the Yarn nodemanagers (hadoop) on analytics10[456]* to pick up new settings - T182276 [production]
2017-12-15 §
16:10 <elukey> re-enable piwik on bohrium after mysql backup restore [production]
10:31 <elukey> rolling restart of yarn nodemanagers on an103* to apply new config - T182276 [production]
09:50 <elukey> restore piwik database on bohrium after mysql corruption - piwik disabled [production]
2017-12-14 §
18:24 <elukey> replace kafka1018 with kafka1023 (Analytics Kafka cluster) [production]
13:41 <elukey> update facts for puppet compiler to pick up new hosts [production]
2017-12-13 §
14:01 <elukey> restart Yarn nodemanagers on analytics102[8,9] to apply new settings - T182276 [production]
11:59 <elukey> forced remount of /mnt/hdfs after OOM event on stat1005 [production]
2017-12-12 §
15:24 <elukey> rename notebook1002 -> kafka1023 - step 3, replace notebook1002 with kafka1023 in the puppet config [production]
15:02 <elukey> clear recdns records related to notebook1002/kafka1023 (rec_control wipe-cache kafka1023.eqiad.wmnet kafka1023.mgmt.eqiad.wmnet notebook1002.eqiad.wmnet 14.5.64.10.in-addr.arpa 104.3.65.10.in-addr.arpa) - T181518 [production]
14:46 <elukey> start rename notebook1002 -> kafka1023 - step 2, dns config (host already shutdown) - T181518 [production]
2017-12-11 §
09:05 <elukey> set notebook1002 as role::spare as prep step to reimage it to kafka1023 [production]
08:12 <elukey> powercycle ganeti1008 - all vms stuck, console com2 showed a ton of printks without a clear indicator of the root cause [production]
2017-12-10 §
20:33 <elukey> execute restart-hhvm on mw1312 - hhvm stuck multiple times queueing requests [production]
20:01 <elukey> ran kafka preferred-replica-election for the kafka analytics cluster (1012->1022) to re-add kafka1012 to the kafka brokers acting as partition leaders (will spread the load in a better way) [production]
2017-12-08 §
11:45 <elukey> updated prometheus-druid-exporter on druid* to 0.6 [production]
11:39 <elukey> upload prometheus-druid-exporter 0.6 to stretch/jessie wikimedia [production]
2017-12-07 §
20:35 <elukey> restart hhvm on mw1235 - hhvm-dump-debug hanging out, not stacktrace available [production]
20:31 <elukey> restart hhvm on mw1281 - hhvm stuck (hhvm-dump-debug timing out) [production]
17:25 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=mw1314.eqiad.wmnet [production]
15:42 <elukey> hhvm-dump-debug for mw1314 saved to /tmp/hhvm.17991.bt. [production]
15:30 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=mw1314.eqiad.wmnet [production]
10:50 <elukey> powercycle analytics1003 - no serial console, ssh stuck in System is booting up. See pam_nologin(8) [production]
10:12 <elukey> reboot analytics1003 for kernel+jvm updates - T179943 [production]
08:28 <elukey> install prometheus-druid-exporter 0.5 on druid* [production]
08:26 <elukey> upload prometheus-druid-exporter 0.5-1 to jessie/stretch-wikimedia [production]
2017-12-05 §
10:45 <elukey> reboot druid1003 for kernel+jvm updates - T179943 [production]
09:42 <elukey> reboot analytics100[12] for kernel+jvm updates (Hadoop Master nodes) - T179943 [production]
2017-12-04 §
14:30 <elukey> reboot druid100[23] for kernel updates [production]
14:01 <elukey> reboot analytics106* (hadoop worker nodes) for kernel+jvm updates - T179943 [production]
09:24 <elukey> reboot analytics104* (hadoop worker nodes) for kernel+jvm updates - T179943 [production]
2017-12-01 §
12:44 <elukey> reboot druid1001 for kernel+jvm updates - T179943 [production]
10:57 <elukey> reboot analytics1028 for kernel + jvm updates (Hadoop HDFS journalnode) - T179943 [production]
09:23 <elukey> reboot analytics104* for kernel+jvm updates - T179943 [production]
08:40 <elukey> reboot the remaining analytics103* hadoop workers to pick up kernel+jvm updates - T179943 [production]
2017-11-30 §
16:12 <elukey> drain and reboot analytics1031->39 to pick up jvm+kernel updates - T179943 [production]
09:14 <elukey> drain and reboot analytics1029/1030 for jvm+kernel updates (Hadoop worker canaries) [production]
2017-11-29 §
14:36 <elukey> reboot druid100[456] for jvm+kernel updates - T179943 [production]
13:18 <elukey> reboot kafka100[23] for jvm+kernel updates - T179943 [production]
11:30 <elukey> reboot kafka1001 for kernel + jvm updates - T179943 [production]
2017-11-28 §
14:17 <elukey> reboot kafka10[12-22] for kernel + jvm updates - T179943 [production]
14:03 <elukey> reboot kafka200[123] for kernel + jvm updates - T179943 [production]
2017-11-27 §
13:22 <elukey> remove eventlogging replication support (log database) from dbstore1002 - T156844 [production]
2017-11-24 §
08:07 <elukey> re-enabling piwik on bohrium (only VM running on ganeti1006 atm) after mysql tables restore completed [production]
2017-11-22 §
16:16 <elukey> restart druid broker,coordinator,historical daemons on druid100[123] to pick up new logging settings [production]
2017-11-21 §
09:39 <elukey> upload prometheus-druid-exporter 0.4 to jessie/stretch-wikimedia [production]
2017-11-20 §
14:05 <elukey> upload prometheus-druid-exporter 0.3 to jessie-wikimedia [production]
13:30 <elukey> upload prometheus-druid-exporter 0.3 to stretch-wikimedia [production]
2017-11-17 §
08:04 <elukey> reboot stat100[456] for kernel updates [production]