9301-9350 of 10000 results (24ms)
2017-09-03 §
17:52 <elukey> depooled cp4024 (ulsfo upload) due to kernel errors in dmesg [production]
2017-09-01 §
10:35 <elukey> stop puppet on thorium and disable root rsyncs - T174756 [production]
07:22 <elukey> restart apache2 and hue on thorium, Analytics sites down, investigating [production]
2017-08-31 §
15:18 <elukey> restart zookeeper on conf200[2,3] for jvm security updates [production]
13:07 <elukey> restart zookeeper on conf2001 for security updates (canary node) [production]
11:21 <elukey> restart apache2 on bohrium for libxml + gnutls security updates [production]
2017-08-30 §
15:56 <elukey> re-added analytics1055 among the hdfs/yarn worker after maintenance [production]
14:07 <elukey> restart java daemons on druid100[456] for jvm security updates [production]
09:07 <elukey> restart all jvm daemons on druid100[123] for security updates [production]
2017-08-29 §
14:51 <elukey> drop log.MobileWebUIClickTracking_10742159_15423246 from db1047 (archived on HDFS) - T172322 [production]
12:37 <elukey> restart kafka daemons on kafka1014 for jvm security updates [production]
11:49 <elukey> restart kafka daemons on kafka1013 for jvm security updates [production]
11:30 <elukey> restart java daemons on analytics100[1,2] (Hadoop Master nodes) for jvm updates [production]
09:29 <elukey> re-installed pmacct/librdkafka1/kafkacat on rhenium with stretch versions - T173489 [production]
08:42 <elukey> restart yarn/hdfs daemons for openjdk security updates [production]
08:06 <elukey> drop log.MobileWebUIClickTracking_10742159_15423246 from dbstore1002 to free space (table archived on HDFS) - T172322 T168303 [production]
2017-08-28 §
14:39 <elukey> restart eventlogging_sync on dbstore1002 - issue after drop of old table [production]
14:31 <elukey> drop PageContentSaveComplete_5588433_15423246 from the log database on db1046 (m4-master) - T170720 [production]
13:55 <elukey> restart kafka* daemons on kafka1012 for openjdk security updates (canary) [production]
2017-08-18 §
11:00 <elukey> reboot dbstore1002 for kernel updates [production]
10:34 <elukey> restart mysql on dbstore1002 - attempt to reclaim space after big table drop (stop slaves and el_sync, check running queries, stop mysql, check process, start mysql) [production]
2017-08-16 §
15:43 <elukey> drop PageContentSaveComplete_5588433_15423246 from db1047 and dbstore1002 (analytics-slaves) [production]
07:08 <elukey> executed sudo find -type f -mtime +30 -exec rm {} \; in /var/log/carbon to free some space [production]
2017-08-15 §
07:18 <elukey> restart pdfrender on scb1003 [production]
2017-08-14 §
16:29 <elukey> execute sudo find -type f -mtime +60 -exec rm {} \; in /var/lib/carbon on graphite2001 to free some space in / [production]
13:32 <elukey> Execute systemctl mask nfacctd on rhenium.wikimedia.org for T172681 [production]
2017-08-12 §
15:25 <elukey> powercycle mw2256 (able to use com2 but not to login as root, regular ssh hanging) - T163346 [production]
2017-08-11 §
13:51 <elukey> moved the eventbus scap deployment dirs on kafka[12]00[123] to deploy-service:deploy-service to allow scap to depool/pool - T171506 [production]
07:35 <elukey> restart pdfrender on scb1004 [production]
2017-08-10 §
14:11 <elukey> restart kafka1012 temporary with some logs to TRACE to debug T172681 [production]
12:07 <elukey> restored varnishakafka on cp3032 [production]
11:17 <elukey> disabled puppet on cp3032 and restarted varnishkafka with debug logging [production]
08:59 <elukey> update librdkafka1 to 0.9.4.1 on eventlog1001 [production]
08:15 <elukey> add 50G to carbon lv on graphite1003 and 100G on graphite2002 [production]
06:45 <elukey> powercycle mw2256 - T163346 [production]
06:38 <elukey> restart pdfrender on scb1004 [production]
2017-08-09 §
16:14 <elukey> rolling restart of eventstream on scb hosts to deploy https://gerrit.wikimedia.org/r/370793 [production]
2017-08-08 §
18:27 <elukey> re-enabled irc-echo after the puppet shower [production]
18:11 <elukey> stop ircecho to avoid puppet shower [production]
16:04 <elukey> rolling restart of varnishkafka-webrequest to apply https://gerrit.wikimedia.org/r/#/c/370659/ (puppet automatically restarts) [production]
14:19 <elukey> restart of all the varnishkafka statsv/eventlogging instances on caching hosts to pick up https://gerrit.wikimedia.org/r/370644 (puppet automatic restarts) [production]
14:16 <elukey> set mw2256 pooled=inactive + downtime to allow BIOS upgrade - T163346 [production]
13:00 <elukey> restart varnishkafka-webrequest with kafka.broker.version.fallback=0.9.0.1 + kafka.api.version.request=false on cp3032 (local test, to rollback remove the lines from /etc/varnishkafka/webrequest.conf) [production]
12:32 <elukey> restart pdfrender on scb1002 [production]
12:14 <elukey> stop eventlogging on eventlog1001 to test kafka consumer failures [production]
10:12 <elukey> update librdkafka1* on notebook100[12] and stat1003 [production]
07:41 <elukey> stop puppet on cp3032 (cache::text) to set varnishkafka-webrequest logging to debug [production]
06:12 <elukey> alert users with big home directories for stat1005 disk alarms (will erase data later on only if they don't answer) [production]
06:12 <elukey> restart pdfrender on scb1003 [production]
2017-08-07 §
14:38 <elukey> updated librdkafka1 and ++1 to 0.9.4.1 on hafnium [production]