9751-9800 of 10000 results (16ms)
2017-02-02 §
15:01 <elukey> Replace Redis/Memcached shards mc200[4567] with mc202[2345] [production]
11:40 <elukey> Swap mc2002 with mc2020, mc2003 with mc2021 (Redis codfw replicas) - T155755 [production]
10:53 <elukey> Swap mc2001 with mc2019 (Redis codfw replicas) - T155755 [production]
2017-02-01 §
16:20 <elukey> restarting Yarn Node Manager daemons on all the Hadoop nodes to bandaid a memory leak causing OOMs [production]
12:09 <elukey@tin> Finished deploy [analytics/refinery@e6254a4]: (no justification provided) (duration: 04m 41s) [production]
12:04 <elukey@tin> Started deploy [analytics/refinery@e6254a4]: (no justification provided) [production]
07:41 <elukey> bootstrapping aqs1008-a on aqs1008 (new AQS cassandra node) [production]
2017-01-31 §
16:11 <elukey> started Cassandra nodetool cleanup for aqs1007-a [production]
16:03 <elukey> started Cassandra nodetool cleanup for aqs1004-b [production]
14:12 <elukey> restarting hhvm on mw1204 (dump debug in /tmp/hhvm.29120.bt) [production]
13:58 <elukey> rebooted analytics1039 to pick up uuids in fstab - T147879 [production]
11:14 <elukey> updating the puppet compiler's facts [production]
08:44 <elukey@puppetmaster1001> conftool action : set/pooled=yes; selector: name=aqs1007.eqiad.wmnet [production]
08:26 <elukey> started Cassandra nodetool cleanup for aqs1004-a [production]
2017-01-30 §
09:25 <elukey> bootstrapping new cassandra instance (aqs1007-b) on AQS - https://gerrit.wikimedia.org/r/#/c/334753/ [production]
08:45 <elukey> restarting aqs on aqs100[4567] to pick up NSS updates [production]
08:19 <elukey> set mw1236.eqiad.wmnet pooled=inactive because powered off (no mentions on the SAL, still trying to find why) [production]
2017-01-26 §
19:13 <elukey> restore analytics1001 as RM and HDFS masters [production]
18:36 <elukey> restarting Yarn node managers on an102[89] and an103[01], impacted by the switch restart [production]
17:57 <elukey> boostrapping aqs1007-a cassandra instance [production]
17:34 <elukey@tin> Finished deploy [analytics/aqs/deploy@5917fd4]: (no message) (duration: 02m 25s) [production]
17:31 <elukey@tin> Starting deploy [analytics/aqs/deploy@5917fd4]: (no message) [production]
13:53 <elukey> restarting cassandra on aqs100[56] to complete the openjdk update [production]
12:54 <elukey> restarting the aqs1004-b casandra instance to pick up the new openjdk (last test before complete rollout) [production]
12:28 <elukey> restarting the aqs1004-a casandra instance to pick up the new openjdk [production]
2017-01-25 §
18:02 <elukey> running authdns-update on ns0.w.o to pick up changes made in https://gerrit.wikimedia.org/r/334040 [production]
09:25 <elukey> updating puppet-compiler facts [production]
07:28 <elukey> upgrading aqs100[56] to node6 [production]
2017-01-24 §
16:37 <elukey> upgrading aqs1004 to node6 [production]
2017-01-23 §
15:19 <elukey> whitelisted dbproxy1011 on cr1/cr2 for analytics-in4 input filter [production]
11:54 <elukey> whitelisted dbproxy1010 on cr1/cr2 for analytics-in4 input filter [production]
2017-01-20 §
10:39 <elukey> manually forcing a /etc/init.d/apache2 reload on mw1259 (videoscaler) to replicate the effects of a logrotate run and test why alarms go off. [production]
2017-01-16 §
15:01 <elukey> restarting hhvm on mw1167 - hhvm-dump-debug in /tmp/hhvm.20360.bt [production]
2017-01-11 §
22:26 <elukey> added mw1239.eqiad.wmnet back to service - T148421 [production]
22:20 <elukey> restarting hhvm on mw1198 (dump-debug in /tmp/hhvm.9737.bt) [production]
2017-01-05 §
07:54 <elukey> chown www-data:www-data all the root:adm hhvm log files on mw eqiad hosts (T132324) [production]
2017-01-03 §
07:58 <elukey> chown www-data:www-data all the root:adm hhvm log files on mw codfw hosts (T132324) [production]
2017-01-02 §
13:24 <elukey> powercycled mw1280, not pingable and mgmt console frozen [production]
2016-12-22 §
14:51 <elukey> restarting the yarn node manager java daemons on all the Hadoop worker nodes due to suspect memory leak [production]
14:14 <elukey> the previous entry is missing: "on analytics1032" [production]
14:13 <elukey> manually starting the yarn nodemanager after OOM [production]
07:26 <elukey> created /var/log/squid3/access.log.1.gz on aluminum to fix cronspam - T132324 [production]
2016-12-21 §
15:04 <elukey> removed mongodb* packages from stat1003 after https://gerrit.wikimedia.org/r/328519 [production]
08:42 <elukey> restarted hhvm/jobrunner (and killed ffmpeg processes) on mw116[89] [production]
2016-12-20 §
08:27 <elukey> renamed some log files ($something.1.gz to $something.1a.gz) on cp1008 and rutherium to unblock logrotation and reduce cronspam - T132324 [production]
2016-12-19 §
13:39 <elukey> Manually raise hhvm.server.connection_timeout_seconds on mw1259 to one day [production]
10:16 <elukey> reimaging mw1168 and mw1169 to Trusty - T153488 [production]
09:38 <elukey> stopping jobrunner/jobchron daemons on mw116[89] as prep step for repurpose to videoscalers - T153488 [production]
09:20 <elukey> killing irc-echo [production]
2016-12-18 §
16:45 <elukey> starting cassandra instances on restbase1009, restbase1011 and restbase1013 (one at the time) - T153588 [production]