production SAL

9651-9700 of 10000 results (33ms)

2017-03-14 §
08:38	<elukey>	moved some log files from /var/log/upstart/$logname.log.1 to /var/log/upstart/$logname.log.1.bis on labvirt1014, labtestvirt2001, labtestnet2001, labnet1001 to reduce cronspam	[production]
2017-03-13 §
11:56	<elukey>	reimage analytics1042 (Hadoop worker node) to Debian Jessie	[production]
06:52	<elukey>	powercycle mw2256, stuck in boot (looked in the console)	[production]
2017-03-10 §
16:25	<elukey>	reboot mw22(5[1-9]\|60) to enable mw-cgroup mountpoint	[production]
13:58	<elukey>	added 3 new MW api-appservers (mw2251-53) and 7 new appservers (mw2254-60) to codfw	[production]
2017-03-09 §
16:10	<elukey>	remove Piwik/bohrium health check from Varnish cache misc (https://gerrit.wikimedia.org/r/#/c/342007/)	[production]
2017-03-08 §
15:19	<elukey>	rebooting mw22(5[4-9]\|60) as part of sanity check for T155180	[production]
15:08	<elukey>	rebooting mw225[123] as part of sanity check for T155180	[production]
10:12	<elukey>	reimage analytics1041 to Debian Jessie	[production]
2017-03-07 §
12:53	<elukey>	analytics1040 back in service - testing the new Debian configuration	[production]
11:27	<elukey>	end of hacking on install1002 (puppet re-enabled)	[production]
09:10	<elukey>	temporary live hacking analytics-flex.cfg partman config on install1002	[production]
2017-03-06 §
18:22	<elukey>	analytics1040 has been silenced and it is not ready to work, need to fix its partman recipe	[production]
11:05	<elukey>	reimage the first Hadoop worker node (an1040) to Debian Jessie	[production]
10:24	<elukey>	(shamefully) replaced /etc/init.d/hadoop-hdfs-datanode script with "exit 0" to prevent the HDFS datanode daemon to start on analytics1028 (broken disk) and leave the rest running (puppet included) - T159632	[production]
2017-03-05 §
10:19	<elukey>	disabled puppet on analytics1028 to avoid puppet to start the HDFS daemon (T159632)	[production]
2017-03-03 §
13:12	<elukey>	removed apache2 (rc state) and apache2-utils from analtytics1027	[production]
11:11	<elukey@tin>	Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 14s)	[production]
11:11	<elukey@tin>	Started deploy [analytics/refinery@1440646]: (no justification provided)	[production]
11:09	<elukey@tin>	Finished deploy [analytics/refinery@1440646]: (no justification provided) (duration: 00m 02s)	[production]
11:09	<elukey@tin>	Started deploy [analytics/refinery@1440646]: (no justification provided)	[production]
2017-03-02 §
14:22	<elukey@tin>	Finished deploy [analytics/refinery@c3dd129]: (no justification provided) (duration: 02m 18s)	[production]
14:20	<elukey@tin>	Started deploy [analytics/refinery@c3dd129]: (no justification provided)	[production]
09:55	<elukey>	increased PHP memory_limit on bohrium for Piwik (T154558)	[production]
2017-03-01 §
15:26	<elukey@tin>	Finished deploy [analytics/refinery@b4a8fcc]: (no justification provided) (duration: 02m 15s)	[production]
15:23	<elukey@tin>	Started deploy [analytics/refinery@b4a8fcc]: (no justification provided)	[production]
14:31	<elukey@tin>	Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 13s)	[production]
14:30	<elukey@tin>	Started deploy [analytics/refinery@33db287]: (no justification provided)	[production]
14:27	<elukey@tin>	Finished deploy [analytics/refinery@33db287]: (no justification provided) (duration: 01m 24s)	[production]
14:26	<elukey@tin>	Started deploy [analytics/refinery@33db287]: (no justification provided)	[production]
2017-02-28 §
17:11	<elukey>	Analytics Hadoop cluster upgraded to CDH 5.10	[production]
14:35	<elukey>	start the Analytics Hadoop cluster upgrade (https://etherpad.wikimedia.org/p/analytics-cdh5.10)	[production]
10:56	<elukey>	restart zookeeper on conf1002	[production]
10:35	<elukey>	restar zookeeper on conf1003	[production]
10:00	<elukey>	restart zookeeper on conf1001	[production]
2017-02-27 §
13:06	<elukey>	restart zookeeper on conf2003	[production]
12:39	<elukey>	restart zookeeper on conf2002	[production]
12:00	<elukey>	rebooting mw2092 due to puppet errors for mw-cgroup - T151427	[production]
11:19	<elukey>	zookeeper status report - new changes rolled out to druid nodes and conf2001 - conf1* and conf200[23] still pending, waiting for more metrics before proceeding	[production]
10:31	<elukey>	limiting the Zookeeper Maximum heap size to 1G (https://gerrit.wikimedia.org/r/#/c/337797/) - setting applied gradually to Zookeeper on Druid and Conf* hosts	[production]
2017-02-25 §
20:06	<elukey>	depooled cp2017 (via local sudo -i depool command) since the host froze (it got back after a powercycle)	[production]
19:54	<elukey>	powercycled cp2017, mgmt console stuck	[production]
2017-02-24 §
09:39	<elukey>	stop Redis and Memcached on mc2001->mc2016 as extra precautionary step before decom - T157675	[production]
2017-02-23 §
09:39	<elukey>	increase cassandra system_auth replication from 6 to 12 on AQS	[production]
2017-02-22 §
10:42	<elukey>	reinstall mw211[89] as MW videoscalers (trusty) and mw2243 as MW jobrunner	[production]
2017-02-21 §
15:40	<elukey>	restart eventlogging on kafka200[123] for openssl upgrades	[production]
15:39	<elukey>	restart jmxtrans on kafka[12]00[123] for T157022	[production]
15:32	<elukey>	correction on my previous entry: restart eventlogging on kafka100[123] for openssl upgrades	[production]
15:22	<elukey>	restart eventlogging on kafka200[123] for openssl upgrades	[production]
15:06	<elukey>	Increased manually maximum httpd keep alive requests and timeout on bohrium (piwik) - T154558	[production]