2017-06-27
§
|
14:22 |
<elukey> |
stop jobcron/jobrunner on mw1300 and mw1301 and reboot the hosts for kernel updates |
[production] |
12:06 |
<elukey> |
stop jobcron/jobrunner on mw1167 and mw1299 and reboot the hosts for kernel updates |
[production] |
11:54 |
<elukey> |
stop nova-spiceproxy and neutron-metadata-agent on labtestnet2001 to avoid root partition to fill up |
[production] |
11:36 |
<elukey> |
stop jobcron/jobrunner on mw116[56] and reboot the hosts for kernel updates |
[production] |
10:29 |
<elukey> |
stop jobcron/jobrunner on mw116[34] and reboot the hosts for kernel updates |
[production] |
10:25 |
<elukey> |
re-enabled puppet and eventlogging_sync on db1047 |
[production] |
08:59 |
<elukey> |
stop puppet and eventlogging_sync on db1047 |
[production] |
08:46 |
<elukey> |
executing alter tables to the log database on db1047 for https://phabricator.wikimedia.org/T167162#3340421 |
[production] |
08:18 |
<elukey> |
stop jobcron/jobrunner on mw116[12] and reboot the hosts for kernel updates |
[production] |
05:58 |
<elukey> |
restored rdb2004 as slave of rdb2003 (end of experiment) |
[production] |
2017-06-26
§
|
16:59 |
<elukey> |
EXPERIMENT - T163337 - set slaveof no one on rdb2004 to remove its dependency to rdb2003 (puppet disabled on rdb2004, to rollback just enable/run it) |
[production] |
16:55 |
<elukey> |
stop neutron-server on labtestnet2001 to avoid the root partition to fill up |
[production] |
13:08 |
<elukey> |
truncate /var/log/upstart/neutron-server.log (root filled up, spam in logs for 'ERROR neutron.service OperationalError: (sqlite3.OperationalError) no such table:') |
[production] |
12:55 |
<elukey> |
reboot mw129[5,6,7,8] for kernel update (mw imagescalers, two at the time) |
[production] |
10:28 |
<elukey> |
reboot mw1288->90 for kernel updates (last batch of api-appservers) |
[production] |
10:18 |
<elukey> |
reboot mw128[4,5,6,7] for kernel updates (api-appservers) |
[production] |
09:34 |
<elukey> |
reboot mw128[0,1,2,3] for kernel updates (api-appservers) |
[production] |
09:04 |
<elukey> |
reboot mw127[6,7,8,9] for kernel updates (api-appservers) |
[production] |
08:58 |
<elukey> |
reboot mw127[3,4,5] for kernel updates (appservers) |
[production] |
08:48 |
<elukey> |
reboot mw1269 -> mw1272 for kernel updates (appservers) |
[production] |
08:28 |
<elukey> |
reboot mw1258, 126[6,7,8] for kernel updates (appservers) |
[production] |
08:11 |
<elukey> |
reboot mw125[4,5,6,7] for kernel updates (appservers) |
[production] |
07:15 |
<elukey> |
restart pdfrender on scb1002 for the xpra issue |
[production] |
07:08 |
<elukey> |
powercycle elastic1017 (stuck in console, no ssh access) |
[production] |
06:56 |
<elukey> |
truncated neutron-server.log files in /var/log on labtestnet2001 to free some space in root |
[production] |
06:50 |
<elukey> |
execute sudo -u _graphite find /var/lib/carbon/whisper/eventstreams/rdkafka -type f -mtime +15 -delete on graphite1001 to free some space for /var/lib/carbon |
[production] |
2017-06-21
§
|
15:01 |
<elukey> |
reboot kafka200[23] for kernel updates (eventbus codfw) |
[production] |
14:03 |
<elukey> |
reboot eventlog2001 for kernel update |
[production] |
13:51 |
<elukey> |
rebooting eventlog1001 for kernel update (eventlogging host) |
[production] |
13:44 |
<elukey> |
reboot aqs100[89] for kernel updates |
[production] |
13:29 |
<elukey> |
reboot aqs1007 for kernel update |
[production] |
13:21 |
<elukey> |
reboot kafka1013 for kernel updates |
[production] |
13:05 |
<elukey> |
reboot analytics1003 (Hue, Camus, Oozie, Hive master) for kernel upgrade |
[production] |
11:14 |
<elukey> |
reboot aqs1006 for kernel update |
[production] |
10:43 |
<elukey> |
reboot analytics1001 (Hadoop master) for kernel update |
[production] |
10:17 |
<elukey> |
running a script in tmux on rdb[12]003 called "check" to dump periodically LLEN enwiki:jobqueue:enqueue:l-unclaimed and stopped the one on rdb2004 |
[production] |
10:01 |
<elukey> |
reboot analytics1002 (Hadoop master standby) for kernel update |
[production] |
09:48 |
<elukey> |
reboot aqs1005 for kernel update |
[production] |
09:10 |
<elukey> |
reboot kafka2001 for kernel update (eventbus codfw) |
[production] |
08:34 |
<elukey> |
reboot kafka1012 for kernel upgrades |
[production] |
06:08 |
<elukey> |
reboot thorium for kernel upgrades (outage to all the analytics websites) |
[production] |
05:59 |
<elukey> |
reboot stat100[2,3,4] for kernel upgrades |
[production] |