2020-07-06
§
|
13:03 |
<elukey> |
force umount/mount of /mnt/hdfs on an-airflow1001 to unblock dpkg checks (fuse misbehaving, all checks hanging) |
[production] |
12:53 |
<elukey> |
kill hanging lsof processes on an-airflow to reduce cpu load |
[production] |
08:09 |
<elukey> |
roll restart aqs on aqs100[4-9] to pick up new druid settings |
[production] |
07:51 |
<elukey> |
enable binlog on matomo's database on matomo1002 |
[production] |
06:54 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99) |
[production] |
06:22 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.change-distro |
[production] |
06:21 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) |
[production] |
06:14 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster |
[production] |
2020-07-03
§
|
15:09 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) |
[production] |
15:02 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster |
[production] |
14:11 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99) |
[production] |
13:59 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster |
[production] |
10:15 |
<elukey> |
notebook1004 renamed to an-scheduler1001 |
[production] |
10:09 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
10:07 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:03 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
07:44 |
<elukey@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
07:40 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
07:39 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
2020-06-25
§
|
14:51 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:48 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:55 |
<elukey> |
rename notebook1003 to an-launcher1002 - T256363 |
[production] |
12:45 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
12:44 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
12:30 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
12:27 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:25 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:25 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
07:36 |
<elukey> |
reboot an-launcher1001 for kernel upgrades |
[production] |
07:18 |
<elukey> |
reboot kafkamon* vms for kernel upgrades |
[production] |
06:40 |
<elukey> |
reboot matomo1002 for kernel upgrades |
[production] |
06:35 |
<elukey> |
reboot archiva1002 (new vm, not yet in service) for kernel upgrades |
[production] |
06:34 |
<elukey> |
reboot archiva for kernel upgrades |
[production] |
06:31 |
<elukey> |
force puppet run on ores1003/1005 to restore celery (killed by the oom) |
[production] |
06:24 |
<elukey> |
reboot an-tool* vms for kernel upgrades |
[production] |
06:23 |
<elukey> |
reboot analytics-tool1004 for kernel upgrades (Superset host) |
[production] |
06:22 |
<elukey> |
reboot analytics-tool1001 for kernel upgrades |
[production] |
06:19 |
<elukey> |
execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service) |
[production] |