7651-7700 of 10000 results (23ms)
2020-07-07 §
10:03 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
10:03 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
10:03 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
08:18 <elukey@cumin1001> END (ERROR) - Cookbook sre.hadoop.change-distro (exit_code=97) [production]
07:27 <elukey@cumin1001> START - Cookbook sre.hadoop.change-distro [production]
07:24 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [production]
07:16 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
2020-07-06 §
13:03 <elukey> force umount/mount of /mnt/hdfs on an-airflow1001 to unblock dpkg checks (fuse misbehaving, all checks hanging) [production]
12:53 <elukey> kill hanging lsof processes on an-airflow to reduce cpu load [production]
08:09 <elukey> roll restart aqs on aqs100[4-9] to pick up new druid settings [production]
07:51 <elukey> enable binlog on matomo's database on matomo1002 [production]
06:54 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99) [production]
06:22 <elukey@cumin1001> START - Cookbook sre.hadoop.change-distro [production]
06:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [production]
06:14 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
2020-07-03 §
15:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [production]
15:02 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
14:11 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99) [production]
13:59 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster [production]
10:15 <elukey> notebook1004 renamed to an-scheduler1001 [production]
10:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
10:07 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:03 <elukey@cumin1001> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
07:44 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
07:40 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
07:39 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
2020-06-29 §
13:00 <elukey> move archiva.wikimedia.org to archiva1002 (new buster vm); create archiva-old.wikimedia.org to archiva1001 [production]
06:50 <elukey> execute gnt-instance remove an-launcher1001.eqiad.wmnet on ganeti1011 - T256363 [production]
06:47 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
06:46 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
06:35 <elukey> force puppet run on ores* to overcome celery OOMs on some nodes [production]
2020-06-25 §
14:51 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
14:48 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:55 <elukey> rename notebook1003 to an-launcher1002 - T256363 [production]
12:45 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
12:44 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
12:30 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
12:27 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:25 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:25 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
07:36 <elukey> reboot an-launcher1001 for kernel upgrades [production]
07:18 <elukey> reboot kafkamon* vms for kernel upgrades [production]
06:40 <elukey> reboot matomo1002 for kernel upgrades [production]
06:35 <elukey> reboot archiva1002 (new vm, not yet in service) for kernel upgrades [production]
06:34 <elukey> reboot archiva for kernel upgrades [production]
06:31 <elukey> force puppet run on ores1003/1005 to restore celery (killed by the oom) [production]
06:24 <elukey> reboot an-tool* vms for kernel upgrades [production]
06:23 <elukey> reboot analytics-tool1004 for kernel upgrades (Superset host) [production]
06:22 <elukey> reboot analytics-tool1001 for kernel upgrades [production]
06:19 <elukey> execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service) [production]