8351-8400 of 10000 results (22ms)
2019-05-16 §
05:34 <elukey> roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - T214275 [production]
2019-05-15 §
17:09 <elukey> powerup elastic2038 (was down for maintenance) [production]
16:50 <elukey> restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings [production]
16:28 <elukey> restart nutcracker on mw2240 to pick up the new config (no more memcached settings) [production]
10:31 <elukey> superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade) [production]
10:04 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s) [production]
10:04 <elukey@deploy1001> Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency [production]
08:45 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s) [production]
08:44 <elukey@deploy1001> Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 [production]
08:36 <elukey> stop superset on analytics-tool1003 as prep step for the migration to the new host - T212243 [production]
07:33 <elukey> restart nutcracker on mw2245 to pick up config changes (removal of memcached config) [production]
07:29 <elukey> powercycle an-worker1094 (OEM event occurred, checking if temporary) [production]
06:24 <elukey> force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck [production]
2019-05-13 §
14:00 <elukey> roll restart of aqs on aqs1* to pick up new druid settings [production]
07:08 <elukey> slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - T222866 [production]
2019-05-12 §
15:32 <elukey> rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - T222941 [production]
12:14 <elukey> restart eventlogging on eventlog1002 - all processors stuck due to kafka python (T222941) [production]
2019-05-11 §
06:37 <elukey> restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated (T222941) [production]
2019-05-10 §
05:40 <elukey> execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days) [production]
05:32 <elukey> restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time [production]
2019-05-09 §
08:23 <elukey> upload uwsgi 2.0.14+20161117-3+deb9u2+wmf1 packages to stretch-wikimedia - T212697 [production]
07:50 <elukey> roll restart HDFS masters on an-master100[1,2] to pick up new logging settings [production]
2019-05-08 §
09:24 <elukey> install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon2001 to test a uwsgi bug fix - T212697 [production]
07:45 <elukey> install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon1002 to test a uwsgi bug fix - T212697 [production]
06:29 <elukey> restart uwsgi-netbox on netmon1002 after the daily segfault (upon restart) [production]
2019-05-07 §
06:44 <elukey> restart uwsgi-netbox on netmon1002 after segfault [production]
2019-05-06 §
17:19 <elukey> restart netbox on netmon1002 as test [production]
09:35 <elukey> restart netbox on netmon1002 (trying to reproduce the segfault) - T212697 [production]
2019-05-05 §
14:42 <elukey> restart pdfrender on scb1004 [production]
2019-05-01 §
17:59 <elukey> force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck) [production]
2019-04-30 §
15:45 <elukey> restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - T220702 [production]
12:34 <elukey> moved /home to /srv/home (more space in a dedicated partition) on stat1005 [production]
09:02 <elukey> roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - T220702 [production]
2019-04-29 §
08:33 <elukey> restart keyholder on deploy1001 + rearm keys [production]
08:28 <elukey> restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied) [production]
2019-04-27 §
17:44 <elukey> restart pdfrender on scb1002 (alert flapping) [production]
2019-04-26 §
08:42 <elukey> restart pdfrender on scb1003 (alert flapping) [production]
2019-04-24 §
06:38 <elukey> restart pdfrender on scb1003 [production]
2019-04-23 §
09:19 <elukey> dumping Kafka consumer offsets' history on logstash1012 for T221202 [production]
05:52 <elukey> powercycle wtp2019 - no ssh, mgmt console stuck [production]
2019-04-19 §
06:39 <elukey> roll restart of druid daemons on druid100[1-3] to pick up new jvm settings [production]
2019-04-18 §
13:08 <elukey> roll restart of cassandra on aqs* to pick up new openjdk upgrades [production]
08:54 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:54 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:54 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
08:54 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:53 <elukey> reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades [production]
2019-04-17 §
14:13 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
14:12 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:52 <elukey> upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - T218343 [production]