2021-03-02
§
|
08:00 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE |
[production] |
07:58 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE |
[production] |
07:58 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE |
[production] |
07:56 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE |
[production] |
07:56 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE |
[production] |
07:54 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE |
[production] |
07:28 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE |
[production] |
07:26 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE |
[production] |
07:26 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE |
[production] |
07:24 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE |
[production] |
2021-02-26
§
|
11:22 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE |
[production] |
11:19 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE |
[production] |
10:18 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE |
[production] |
10:16 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE |
[production] |
08:19 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE |
[production] |
08:17 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE |
[production] |
08:04 |
<elukey> |
run ipmi mc reset cold for analytics1058 - mgmt responding to pings and ipmi, but not to ssh |
[production] |
07:01 |
<elukey> |
reboot an-worker1099 to clear out kernel soft lockup errors |
[production] |
06:59 |
<elukey> |
restart datanode on an-worker1099 - soft lockup kernel errors |
[production] |
2021-02-25
§
|
15:38 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2002.codfw.wmnet |
[production] |
15:23 |
<elukey@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2002.codfw.wmnet |
[production] |
15:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2001.codfw.wmnet |
[production] |
15:05 |
<elukey@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2001.codfw.wmnet |
[production] |
10:59 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1117-1118].eqiad.wmnet |
[production] |
10:57 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1117-1118].eqiad.wmnet |
[production] |
10:42 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE |
[production] |
10:40 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE |
[production] |
10:40 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE |
[production] |
10:38 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE |
[production] |