7451-7500 of 10000 results (17ms)
2020-09-15 §
09:01 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
08:59 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
08:53 <elukey> roll restart druid zookeeper clusters for openjdk upgrades [production]
08:53 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
08:52 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
08:04 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
08:02 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
07:19 <elukey> roll restart druid cluster to pick up openjdk updates [production]
07:19 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
2020-09-14 §
16:04 <elukey> completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster [production]
15:23 <elukey> enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005 [production]
14:55 <elukey> ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now [production]
06:56 <elukey> slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168) [production]
05:54 <elukey> execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - T258768 [production]
2020-09-10 §
07:03 <elukey> resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - T262385 [production]
2020-09-09 §
07:25 <elukey> restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage [production]
2020-09-08 §
18:22 <elukey> rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305 [production]
15:30 <elukey> roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed [production]
15:26 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) [production]
15:18 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
13:34 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) [production]
13:20 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
13:14 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) [production]
13:14 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
07:44 <elukey> roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades [production]
06:23 <elukey> roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings [production]
2020-09-07 §
16:12 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:10 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:27 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
14:25 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
14:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
14:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:25 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
10:37 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
10:35 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
10:02 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
10:00 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:09 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2020-09-06 §
08:20 <elukey> powercycle mw1360 (mgmt console available, network errors while running anything) [production]
08:04 <elukey@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet [production]
08:01 <elukey> executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360) [production]
2020-09-04 §
10:31 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [production]
09:11 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
08:58 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) [production]
08:31 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
08:29 <elukey> roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades [production]