production SAL

9551-9600 of 10000 results (26ms)

2017-04-26 §
17:44	<elukey>	unmasking and starting daemons on restbase-dev1003	[production]
16:14	<elukey>	stop and mask cassandra and restbase on restbase-dev1003 for row-d maintenance	[production]
14:26	<elukey>	depooling aqs100[69] from AQS for network maintenance	[production]
14:20	<elukey>	stop zookeeper on conf1003 for row-d maintenance (Hadoop, Kafka related)	[production]
13:53	<elukey>	stop kafka on kafka1020 and kafka1018 for row-d extended maintenance (D2)	[production]
13:22	<elukey>	restart HDFS on analytics100[12] (Hadoop master nodes) to pick up recent topology changes for the cluster	[production]
08:32	<elukey>	Gracefully stopping hadoop daemons on Hadoop nodes affected by Row-D maintenance	[production]
2017-04-24 §
13:47	<elukey>	reimage analytics1003 to Jessie (Oozie/Hive/Camus not available during this timeframe in the Analytics Hadoop cluster)	[production]
2017-04-21 §
08:20	<elukey>	rolling restart of aqs (nodejs) on aqs* to pick up upgrades	[production]
2017-04-20 §
16:08	<elukey>	uploaded piwik 2.17.1-1 to jessie-wikimedia main	[production]
13:11	<elukey>	upgrading Piwik to 2.17.1 (brief downtime during the maintenance announced)	[production]
12:12	<elukey>	restart Yarn Resource manager on analytics1001 (hadoop master) to pick up new JVM settings	[production]
10:07	<elukey>	restart Yarn Resource manager on analytics1002 (hadoop master standby) to pick up new JVM settings	[production]
2017-04-19 §
09:11	<elukey>	cleaning up ocg1003's /srv/deployment/ocg/postmortem dir (root partition filled up)	[production]
2017-04-18 §
17:09	<elukey>	restart nutcracker in codfw (profile::mediawiki::nutcracker) to make sure that all the daemons are running with the latest config	[production]
15:20	<elukey>	restored default output-buffer config for rdb2005:6479	[production]
14:22	<elukey>	executed config set client-output-buffer-limit "normal 0 0 0 slave 2147483648 2147483648 300 pubsub 33554432 8388608 60" on rdb2005:6749 as attempt to solve slave lagging - T159850	[production]
14:11	<elukey>	executed CONFIG SET appendfsync everysec (default) to restore defaults on rdb2005:6479- T159850	[production]
14:04	<elukey>	executed CONFIG SET appendfsync no on rdb2005:6479 to test if fsync stalls affect replication - T159850	[production]
2017-04-16 §
15:44	<elukey>	restart ocg on ocg1003 to clean up deleted files in lsof	[production]
15:35	<elukey>	executing sudo find -name *.pdf -mtime +3 -exec rm {} \; on ocg1003's /srv/deployment/ocg/output to clean up some disk space - T162780	[production]
2017-04-14 §
10:29	<elukey>	rollback systctl settings on mw1306 after experiment (stop jobchron/runner, stop hhvm, restore systctl settings, restart hhvm and job* daemons)	[production]
09:50	<elukey>	temporarily set sysctl -w net.netfilter.nf_conntrack_max=524288 on mw1306 (jobrunner) as test - (rollback: sysctl -w net.netfilter.nf_conntrack_max=262144")	[production]
09:43	<elukey>	temporarily set sysctl -w net.ipv4.ip_local_port_range="15000 64000" on mw1306 (jobrunner) as test - (rollback: sysctl -w net.ipv4.ip_local_port_range="32768 60999") - T157968	[production]
08:32	<elukey>	restored appendfsync to 'everysec' on Redis rdb2005:6380 (end of performance experiment)	[production]
07:23	<elukey>	executed CONFIG SET appendfsync no on redis2005:6780 as performance test	[production]
2017-04-13 §
16:55	<elukey>	restored default value of client-output-buffer-limit on rdb1007:6379 - T159850	[production]
12:52	<elukey>	temporary set config set client-output-buffer-limit "slave 5368709120 5368709120 180" on rdb1007:6379	[production]
12:34	<elukey>	temporary set config set client-output-buffer-limit "slave 3221225472 3221225472 180" on rdb1007:6379	[production]
11:59	<elukey>	temporary set config set client-output-buffer-limit "slave 2536870912 2536870912 60" on rdb1007:6379	[production]
11:37	<elukey>	temporary set config set client-output-buffer-limit "slave 2147483648 2147483648 60" on rdb1007:6379 to give time to rdb2005's replication to catch up - T159850	[production]
10:47	<elukey>	reverted previous config for Redis rdb2005	[production]
10:22	<elukey>	executed CONFIG SET appendfsync no (prev value: "everysec") to Redis instance 6380 on rdb2005 - T125735	[production]
06:29	<elukey>	re-arm keyholder on mira after reboot	[production]
06:14	<elukey>	powercycle mira - eth0 errors in the dmesg, CPU system utilization skyrocketed	[production]
2017-04-12 §
13:41	<elukey>	apply SLOWLOG RESET and CONFIG SET slowlog-max-len 100000 (prev value 10000, 10ms) to rdb1005:6380 to track down slow reqs - T125735	[production]
13:33	<elukey>	restored slowlog-log-slower-than 10000 on rdb2005	[production]
13:25	<elukey>	applied CONFIG SET slowlog-log-slower-than 300000 to Redis 6379 on rdb2005 and reset slowlog history to play with the stats	[production]
12:23	<elukey>	restart HDFS datanode daemons on all the Hadoop worker node to pick up the new JVM settings	[production]
11:57	<elukey>	restart Yarn nodemanager daemons on all the Hadoop worker node to pick up the new JVM settings	[production]
06:37	<elukey>	reimage mw2246.codfw.wmnet mw2152.codfw.wmnet to remove the /tmp partition (codfw videoscalers, switchover prep)	[production]
2017-04-11 §
18:34	<elukey>	restart hhvm on mw1165 (debug in /tmp/hhvm.5384.bt.)	[production]
12:47	<elukey>	reimage mw2246 (Debian codfw videoscaler) to Trusty	[production]
11:33	<elukey>	resume reboot of analytics1040->1050 for kernel upgrades	[production]
06:30	<elukey>	restart hhvm on mw1299 - dump debug in /tmp/hhvm.84379.bt	[production]
2017-04-10 §
17:51	<elukey>	restore Hadoop masters to analytics1001	[production]
14:05	<elukey>	reimage anaytics1001 to Debian Jessie	[production]
13:19	<elukey>	reboot analytics1040->1050 to pick up the new kernel	[production]
08:39	<elukey>	manual failover of Hadoop master daemons from analyitics1001 to analytics1002 (T160333)	[production]
2017-04-07 §
14:13	<elukey>	restart hadoop-hdfs-namenode on an1002 (Hadoop Master standby) to pick up new jvm settings	[production]