2016-10-20
ยง
|
13:10 |
<elukey> |
force failover from temporary Hadoop Master node (an1002) to its stanby (an1001) to restore the standard configuration |
[production] |
13:05 |
<elukey> |
correction: force failover for Hadoop Master node (an1001) to its stanby (an1002) and rebooting an1001 for kernel upgrades |
[production] |
12:59 |
<elukey> |
force failover for Hadoop Master node (an1002) to its stanby (an1002) and rebooting an1001 for kernel upgrades |
[production] |
12:59 |
<moritzm> |
ferm on baham (failed to start due to failing DNS resolution in early boot) |
[production] |
12:52 |
<moritzm> |
restarting mx2001 for kernel update |
[production] |
12:48 |
<moritzm> |
bounced ntp on mw2116 (XFAC state) |
[production] |
12:39 |
<elukey> |
restarting an1003 for kernel upgrades (oozie/hive master) |
[production] |
12:35 |
<moritzm> |
bounced ntp on baham (was stick in INIT phase) |
[production] |
12:31 |
<apergos> |
more app server rolling restarts for codfw: mw2163-2199 |
[production] |
12:29 |
<apergos> |
more API server rolling restarts for eqiad: mw1221-1235, 1276-1290 |
[production] |
12:27 |
<apergos> |
more APP server rolling restarts for eqiad: mw1209-1216, 128-1220, 1236-38, 1240-1258 |
[production] |
12:12 |
<moritzm> |
restarting bast2001 for kernel update |
[production] |
12:11 |
<apergos> |
retaction. those are app servers, not starting them yet |
[production] |
12:10 |
<apergos> |
more api server rolling restarts for eqiad: mw1209-1216, 128-1220, 1236-38, 1240-1258 |
[production] |
12:08 |
<moritzm> |
bounced ntp on mw2206 (XFAC state) |
[production] |
12:05 |
<bblack> |
correction: rebooting baham / ns1.wikimedia.org for kernel |
[production] |
12:04 |
<bblack> |
rebooting baham / ns2.wikimedia.org for kernel |
[production] |
11:53 |
<elukey> |
rebooting an1027 (camus job launcher) for kernel upgrades |
[production] |
11:48 |
<moritzm> |
bounced ntp on mw2101 and mw2147 (XFAC state) |
[production] |
11:48 |
<bblack> |
depool cp1047 (cache_maps eqiad) |
[production] |
11:23 |
<apergos> |
rolling restarts of more api servers in codfw: mw2200 - 2220 |
[production] |
11:17 |
<elukey> |
rebooting all the Analytics Hadoop nodes for kernel upgrades |
[production] |
11:07 |
<mobrovac> |
change-prop restarting in codfw after kafka kernel upgrade |
[production] |
10:58 |
<apergos> |
rolling reboots for first batch of app servers in eqiad: mw1170-1188 |
[production] |
10:50 |
<elukey> |
rebooting kafka200[12] for kernel upgrades (Kafka main-codfw non live cluster) |
[production] |
10:38 |
<apergos> |
rolling restarts on first batch of api servers in eqiad: mw1189-1208 |
[production] |
10:21 |
<apergos> |
while the first batch of codfw api servers trundle along, starting rolling reboots for appservers in codfw starting with mw2090-2098, 2100-2119 |
[production] |
10:20 |
<moritzm> |
removing a few older kernels on analytics1036, was short of disk space in /boot partition |
[production] |
10:05 |
<elukey> |
rebooting the Analytics Hadoop cluster for kernel upgrades |
[production] |
09:50 |
<jynus> |
stop sql thread replication for db1053 and applying partitioning as a "special slave" |
[production] |
09:32 |
<godog> |
rolling restart of graphite machines for kernel upgrade |
[production] |
09:16 |
<apergos> |
restarts of mw2075,6,7 done, starting rolling restarts shortly of 8,9, 2120-2147 |
[production] |
08:57 |
<akosiaris> |
rebooting wtp10{02,06,12,13,17,22} for kernel upgrade |
[production] |
08:57 |
<elukey> |
rebooting eventlog2001 for kernel upgrades (EL spare host) |
[production] |
08:54 |
<elukey> |
rebooting eventlog1001 for kernel upgrades (Eventlogging host) |
[production] |
08:53 |
<moritzm> |
rebooting bast4001 for kernel update |
[production] |
08:49 |
<moritzm> |
rebooting restbase-test* for kernel upgrade |
[production] |
08:43 |
<akosiaris> |
rebooting wtp10{01,03,04,05,18,23} for kernel upgrade |
[production] |
08:34 |
<akosiaris> |
rebooting wtp10{07,08,09,10,19,24} for kernel upgrade |
[production] |
08:32 |
<elukey> |
rebooting aqs100[456] for kernel upgrades (one at the time, de-pool/reboot/pool) |
[production] |
08:31 |
<elukey> |
rebooting aqs100[123] for kernel upgrades (one at the time, de-pool/reboot/pool) |
[production] |
08:25 |
<akosiaris> |
rebooting wtp10{10,14,15,16,20,21} for kernel upgrade |
[production] |
08:19 |
<akosiaris> |
reboot the rest of the wtp20XX hosts for kernel upgrade |
[production] |
08:15 |
<akosiaris@puppetmaster1001> |
conftool action : set/pooled=no; selector: wtp2019.codfw.wmnet (tags: ['dc=codfw', 'cluster=parsoid', 'service=parsoid']) |
[production] |
08:10 |
<akosiaris> |
reboot wtp20{03,05,08,09,12,15,17,18,20} for kernel upgrade |
[production] |
08:09 |
<mobrovac> |
change-prop deploying 3a11886 |
[production] |
07:52 |
<moritzm> |
rebooting bast3001 for kernel update |
[production] |
07:51 |
<gehel> |
start of elasticsearch codfw rolling restart |
[production] |
07:32 |
<moritzm> |
rebooting snapshot1001 for kernel update |
[production] |
07:27 |
<moritzm> |
rebooting snapshot1005-1007 for kernel update |
[production] |