9651-9700 of 10000 results (38ms)
2017-03-14 §
08:38 <elukey> moved some log files from /var/log/upstart/$logname.log.1 to /var/log/upstart/$logname.log.1.bis on labvirt1014, labtestvirt2001, labtestnet2001, labnet1001 to reduce cronspam [production]
2017-03-13 §
11:56 <elukey> reimage analytics1042 (Hadoop worker node) to Debian Jessie [production]
06:52 <elukey> powercycle mw2256, stuck in boot (looked in the console) [production]
2017-03-10 §
16:25 <elukey> reboot mw22(5[1-9]|60) to enable mw-cgroup mountpoint [production]
13:58 <elukey> added 3 new MW api-appservers (mw2251-53) and 7 new appservers (mw2254-60) to codfw [production]
2017-03-09 §
16:10 <elukey> remove Piwik/bohrium health check from Varnish cache misc (https://gerrit.wikimedia.org/r/#/c/342007/) [production]
2017-03-08 §
15:19 <elukey> rebooting mw22(5[4-9]|60) as part of sanity check for T155180 [production]
15:08 <elukey> rebooting mw225[123] as part of sanity check for T155180 [production]
10:12 <elukey> reimage analytics1041 to Debian Jessie [production]
2017-03-07 §
12:53 <elukey> analytics1040 back in service - testing the new Debian configuration [production]
11:27 <elukey> end of hacking on install1002 (puppet re-enabled) [production]
09:10 <elukey> temporary live hacking analytics-flex.cfg partman config on install1002 [production]
2017-03-06 §
18:22 <elukey> analytics1040 has been silenced and it is not ready to work, need to fix its partman recipe [production]
11:05 <elukey> reimage the first Hadoop worker node (an1040) to Debian Jessie [production]
10:24 <elukey> (shamefully) replaced /etc/init.d/hadoop-hdfs-datanode script with "exit 0" to prevent the HDFS datanode daemon to start on analytics1028 (broken disk) and leave the rest running (puppet included) - T159632 [production]
2017-03-05 §
10:19 <elukey> disabled puppet on analytics1028 to avoid puppet to start the HDFS daemon (T159632) [production]
2017-03-03 §
13:12 <elukey> removed apache2 (rc state) and apache2-utils from analtytics1027 [production]
11:11 <elukey@tin> Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 14s) [production]
11:11 <elukey@tin> Started deploy [analytics/refinery@1440646]: (no justification provided) [production]
11:09 <elukey@tin> Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 02s) [production]
11:09 <elukey@tin> Started deploy [analytics/refinery@1440646]: (no justification provided) [production]
2017-03-02 §
14:22 <elukey@tin> Finished deploy [analytics/refinery@c3dd129]: (no justification provided) (duration: 02m 18s) [production]
14:20 <elukey@tin> Started deploy [analytics/refinery@c3dd129]: (no justification provided) [production]
09:55 <elukey> increased PHP memory_limit on bohrium for Piwik (T154558) [production]
2017-03-01 §
15:26 <elukey@tin> Finished deploy [analytics/refinery@b4a8fcc]: (no justification provided) (duration: 02m 15s) [production]
15:23 <elukey@tin> Started deploy [analytics/refinery@b4a8fcc]: (no justification provided) [production]
14:31 <elukey@tin> Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 13s) [production]
14:30 <elukey@tin> Started deploy [analytics/refinery@33db287]: (no justification provided) [production]
14:27 <elukey@tin> Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 24s) [production]
14:26 <elukey@tin> Started deploy [analytics/refinery@33db287]: (no justification provided) [production]
2017-02-28 §
17:11 <elukey> Analytics Hadoop cluster upgraded to CDH 5.10 [production]
14:35 <elukey> start the Analytics Hadoop cluster upgrade (https://etherpad.wikimedia.org/p/analytics-cdh5.10) [production]
10:56 <elukey> restart zookeeper on conf1002 [production]
10:35 <elukey> restar zookeeper on conf1003 [production]
10:00 <elukey> restart zookeeper on conf1001 [production]
2017-02-27 §
13:06 <elukey> restart zookeeper on conf2003 [production]
12:39 <elukey> restart zookeeper on conf2002 [production]
12:00 <elukey> rebooting mw2092 due to puppet errors for mw-cgroup - T151427 [production]
11:19 <elukey> zookeeper status report - new changes rolled out to druid nodes and conf2001 - conf1* and conf200[23] still pending, waiting for more metrics before proceeding [production]
10:31 <elukey> limiting the Zookeeper Maximum heap size to 1G (https://gerrit.wikimedia.org/r/#/c/337797/) - setting applied gradually to Zookeeper on Druid and Conf* hosts [production]
2017-02-25 §
20:06 <elukey> depooled cp2017 (via local sudo -i depool command) since the host froze (it got back after a powercycle) [production]
19:54 <elukey> powercycled cp2017, mgmt console stuck [production]
2017-02-24 §
09:39 <elukey> stop Redis and Memcached on mc2001->mc2016 as extra precautionary step before decom - T157675 [production]
2017-02-23 §
09:39 <elukey> increase cassandra system_auth replication from 6 to 12 on AQS [production]
2017-02-22 §
10:42 <elukey> reinstall mw211[89] as MW videoscalers (trusty) and mw2243 as MW jobrunner [production]
2017-02-21 §
15:40 <elukey> restart eventlogging on kafka200[123] for openssl upgrades [production]
15:39 <elukey> restart jmxtrans on kafka[12]00[123] for T157022 [production]
15:32 <elukey> correction on my previous entry: restart eventlogging on kafka100[123] for openssl upgrades [production]
15:22 <elukey> restart eventlogging on kafka200[123] for openssl upgrades [production]
15:06 <elukey> Increased manually maximum httpd keep alive requests and timeout on bohrium (piwik) - T154558 [production]