7351-7400 of 10000 results (15ms)
2020-10-14 §
15:29 <elukey> drain + reboot an-worker110[1,2] to pick up GPU settings - T255138 [production]
15:28 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
15:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [production]
14:56 <elukey> drain + reboot an-worker109[8,9] to pick up GPU settings - T255138 [production]
14:55 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
08:34 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
08:28 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
2020-10-09 §
09:47 <elukey> roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings [production]
2020-10-07 §
17:35 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:33 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-10-06 §
12:20 <elukey> update HDFS Namenode GC/Heap settings on an-master100[1,2] [production]
08:57 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
08:55 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
08:27 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:26 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-10-05 §
18:17 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) [production]
18:17 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
18:15 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
18:13 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
18:11 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
18:10 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
17:53 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:51 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
17:29 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:27 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:41 <elukey> shutdown stat1005 and stat1008 for ram expansion (1005 again) [production]
14:25 <elukey> shutdown an-master1001 for ram expansion [production]
13:54 <elukey> shutdown stat1005 for ram upgrade [production]
13:31 <elukey> shutdown an-master1002 for ram expansion (64 -> 128G) [production]
10:37 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
10:34 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
06:33 <elukey> reboot stat1005 to resolve weird GPU state (scheduled last week) [production]
2020-10-02 §
09:58 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
09:56 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
09:30 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:28 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-10-01 §
07:12 <elukey> restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings [production]
06:42 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
06:40 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
2020-09-30 §
14:05 <elukey> create thirdparty/amd-rocm33 for stretch-wikimedia [production]
07:01 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s) [production]
07:00 <elukey@deploy1001> Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 [production]
06:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
06:19 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
2020-09-29 §
14:32 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) [production]
14:30 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers [production]
2020-09-28 §
15:03 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:01 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:00 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:59 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]