8701-8750 of 10000 results (26ms)
2018-07-11 §
17:06 <elukey> restarted kafka on kafka1001 with Xmx 2G and Xms 2F [production]
16:50 <elukey> stop topics cleaner script [production]
16:36 <elukey> start topic clean procedure on kafka1001 (tmux root session) [production]
16:19 <elukey> restart kafka on kafka1003 [production]
15:11 <elukey> restart again kafka on kafka100[1,2] - failed for OOM [production]
15:03 <elukey> restart kafka on kafka1003 [production]
14:57 <elukey> rolling restart of eventbus on kafka100[1-3] [production]
14:53 <elukey> restart kafka on kafka1002 [production]
14:52 <elukey> restart kafka on kafka1001 [production]
13:14 <elukey> roll restart of aqs on aqs* to pick up the new Druid config [production]
07:57 <elukey> roll restart of aqs on aqs* to rollback the druid config [production]
2018-07-10 §
10:03 <elukey> restart analytics100[1,2]'s hadoop resource managers, some I/O socket errors after the ip6 interface change [production]
09:34 <elukey> forced umount of /mnt/hdfs on stat1004, several processes hang for it (causing load) and transport not connected [production]
08:34 <elukey> rolling restart of AQS to apply the new config [production]
2018-07-09 §
15:32 <elukey> enabled snappy compression for varnishkafka eventlogging [production]
07:18 <elukey> update filter analytics-in4 on cr1/cr2 eqiad [production]
06:25 <elukey> restart hue on thorium to pick up new smtp changes - T196920 [production]
2018-07-08 §
11:27 <elukey> restart rsyslog on lithium - in:imtcp thread stuck at 99% cpu usage [production]
2018-07-05 §
15:26 <elukey> upgrade (without jvm restart) prometheus-jmx-exporter on the analytics node listed in debmonitor still not running the last version [production]
07:13 <elukey> stop mariadb on analytics1003 to apply https://gerrit.wikimedia.org/r/443893 and enable auth via unix socket [production]
2018-07-04 §
09:15 <elukey> reimage aqs1009 to Debian Stretch [production]
08:53 <elukey> update analytics-in4 filter rules on cr1/cr2 eqiad - T198623 [production]
06:15 <elukey> reimage aqs1008 to Debian Stretch [production]
2018-07-03 §
11:39 <elukey> reimage aqs1007 to Debian Stretch [production]
09:11 <elukey> reimage aqs1006 to Debian Stretch [production]
07:38 <elukey> reimage aqs1005 to debian stretch [production]
2018-07-02 §
14:29 <elukey> copy cassandra-tools-wmf 1.0.2-1 from jessie-wikimedia to stretch-wikimedia [production]
13:34 <elukey> reimage aqs1004 to Debian Stretch [production]
09:53 <elukey> reboot ms-be1039 (bad disk, spike in I/O and load, not reachable via ssh or mgmt console) [production]
2018-06-29 §
05:40 <elukey> force umount of dumps labstore nfs mountpoints on stat100[56]/notebook100[34] to reduce load (also too many open files) [production]
2018-06-28 §
14:46 <elukey> upgrade piwik 3.2.1 to matomo (new name/package) 3.5.1 - T192298 [production]
13:49 <elukey> downgrade cassadra and cassandra-tools from 2.2.6-wmf5 to 2.2.6-wmf3 in jessie-wikimedia component/cassandra22 - T197062 [production]
13:01 <elukey> upload matomo (new Piwik) 3.5.1-1 to jessie-wikimedia [production]
12:49 <elukey> stop hadoop daemons on analytics1032 + shutdown to swap BBU -T194234 [production]
08:15 <elukey> restart-hhvm on mw1227 (some threads stuck in jit-related operations, causing high load) [production]
07:12 <elukey> upload piwik 3.2.1 to jessie-wikimedia [production]
2018-06-27 §
21:50 <elukey> piwik maintenance on bohrium completed [production]
13:07 <elukey> piwik upgraded to 3.2.1 on bohrium + started the db migration procedure (will last 2/3h probably) [production]
2018-06-26 §
14:43 <elukey> rm syslog.1.gz puppet.log.1.gz on tegment to fix cronspam [production]
2018-06-25 §
14:14 <elukey> merging jmxtrans and kafkatee's submodules to operations/puppet - part 2 (moving them back from environments/production) [production]
13:53 <elukey> merging jmxtrans and kafkatee's submodules to operations/puppet - part 1 (moving them under environments/production) [production]
2018-06-15 §
14:49 <elukey> restart varnishkafka-eventlogging on cp4028, errors logged [production]
14:43 <elukey> restart varnishkafka-eventlogging on cp5012 as attempt to clear out the errors (not needed but logging it anyway) [production]
2018-06-14 §
11:16 <elukey> upgrade cassandra on aqs* to 2.2.6-wmf5 [production]
09:15 <elukey> add debmonitor term to analytics-in4 on cr1/cr2 eqiad [production]
08:31 <elukey> restart hadoop hdfs master nodes to pick up the new journal node settings [production]
08:07 <elukey> roll restart of hadoop journal nodes to pick up the new configuration (two more journal nodes added) [production]
2018-06-13 §
15:55 <elukey> rolling restart of aqs on aqs100[4-9] to pick up the new config changes [production]
13:28 <elukey@deploy1001> Finished deploy [analytics/aqs/deploy@160206f]: (no justification provided) (duration: 04m 11s) [production]
13:24 <elukey@deploy1001> Started deploy [analytics/aqs/deploy@160206f]: (no justification provided) [production]