9401-9450 of 10000 results (24ms)
2017-06-28 §
06:37 <elukey> restart pdfrender.service on scb1003 - xpra race condition [production]
06:35 <elukey> executed sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +10 -delete on graphite1001 to free space [production]
2017-06-27 §
14:22 <elukey> stop jobcron/jobrunner on mw1300 and mw1301 and reboot the hosts for kernel updates [production]
12:06 <elukey> stop jobcron/jobrunner on mw1167 and mw1299 and reboot the hosts for kernel updates [production]
11:54 <elukey> stop nova-spiceproxy and neutron-metadata-agent on labtestnet2001 to avoid root partition to fill up [production]
11:36 <elukey> stop jobcron/jobrunner on mw116[56] and reboot the hosts for kernel updates [production]
10:29 <elukey> stop jobcron/jobrunner on mw116[34] and reboot the hosts for kernel updates [production]
10:25 <elukey> re-enabled puppet and eventlogging_sync on db1047 [production]
08:59 <elukey> stop puppet and eventlogging_sync on db1047 [production]
08:46 <elukey> executing alter tables to the log database on db1047 for https://phabricator.wikimedia.org/T167162#3340421 [production]
08:18 <elukey> stop jobcron/jobrunner on mw116[12] and reboot the hosts for kernel updates [production]
05:58 <elukey> restored rdb2004 as slave of rdb2003 (end of experiment) [production]
2017-06-26 §
16:59 <elukey> EXPERIMENT - T163337 - set slaveof no one on rdb2004 to remove its dependency to rdb2003 (puppet disabled on rdb2004, to rollback just enable/run it) [production]
16:55 <elukey> stop neutron-server on labtestnet2001 to avoid the root partition to fill up [production]
13:08 <elukey> truncate /var/log/upstart/neutron-server.log (root filled up, spam in logs for 'ERROR neutron.service OperationalError: (sqlite3.OperationalError) no such table:') [production]
12:55 <elukey> reboot mw129[5,6,7,8] for kernel update (mw imagescalers, two at the time) [production]
10:28 <elukey> reboot mw1288->90 for kernel updates (last batch of api-appservers) [production]
10:18 <elukey> reboot mw128[4,5,6,7] for kernel updates (api-appservers) [production]
09:34 <elukey> reboot mw128[0,1,2,3] for kernel updates (api-appservers) [production]
09:04 <elukey> reboot mw127[6,7,8,9] for kernel updates (api-appservers) [production]
08:58 <elukey> reboot mw127[3,4,5] for kernel updates (appservers) [production]
08:48 <elukey> reboot mw1269 -> mw1272 for kernel updates (appservers) [production]
08:28 <elukey> reboot mw1258, 126[6,7,8] for kernel updates (appservers) [production]
08:11 <elukey> reboot mw125[4,5,6,7] for kernel updates (appservers) [production]
07:15 <elukey> restart pdfrender on scb1002 for the xpra issue [production]
07:08 <elukey> powercycle elastic1017 (stuck in console, no ssh access) [production]
06:56 <elukey> truncated neutron-server.log files in /var/log on labtestnet2001 to free some space in root [production]
06:50 <elukey> execute sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete on graphite1001 to free some space for /var/lib/carbon [production]
2017-06-25 §
09:00 <elukey> Executing 'sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete' on graphite1001 to free some space (/var/lib/carbon filling up) - T1075 [production]
2017-06-23 §
09:55 <elukey> reboot mw1250-53 for kernel updates [production]
2017-06-22 §
09:06 <elukey> rebooting kafka100[23] for kernel updates (evenbus eqiad) [production]
07:24 <elukey> reboot kafka1001 for kernel updates (eventbus eqiad) [production]
2017-06-21 §
15:01 <elukey> reboot kafka200[23] for kernel updates (eventbus codfw) [production]
14:03 <elukey> reboot eventlog2001 for kernel update [production]
13:51 <elukey> rebooting eventlog1001 for kernel update (eventlogging host) [production]
13:44 <elukey> reboot aqs100[89] for kernel updates [production]
13:29 <elukey> reboot aqs1007 for kernel update [production]
13:21 <elukey> reboot kafka1013 for kernel updates [production]
13:05 <elukey> reboot analytics1003 (Hue, Camus, Oozie, Hive master) for kernel upgrade [production]
11:14 <elukey> reboot aqs1006 for kernel update [production]
10:43 <elukey> reboot analytics1001 (Hadoop master) for kernel update [production]
10:17 <elukey> running a script in tmux on rdb[12]003 called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed and stopped the one on rdb2004 [production]
10:01 <elukey> reboot analytics1002 (Hadoop master standby) for kernel update [production]
09:48 <elukey> reboot aqs1005 for kernel update [production]
09:10 <elukey> reboot kafka2001 for kernel update (eventbus codfw) [production]
08:34 <elukey> reboot kafka1012 for kernel upgrades [production]
06:08 <elukey> reboot thorium for kernel upgrades (outage to all the analytics websites) [production]
05:59 <elukey> reboot stat100[2,3,4] for kernel upgrades [production]
2017-06-20 §
17:29 <elukey> running a script in tmux on rdb200[34] called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed [production]
17:21 <elukey> restart redis-instance-tcp_6380.service on rdb2003 to force sync with its master [production]