8151-8200 of 10000 results (24ms)
2019-09-17 §
16:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [production]
15:39 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
14:52 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) [production]
14:52 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
07:42 <elukey> reboot analytics-tool1004 (host running superset) for kernel updates [production]
2019-09-16 §
13:48 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) [production]
12:40 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
12:17 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) [production]
12:17 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]
09:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:24 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:16 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:14 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2019-09-13 §
11:11 <elukey> reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades [production]
11:10 <elukey> reboot an-tool1007 (runs turnilo) for kernel upgrades [production]
2019-09-11 §
08:13 <elukey> add thirdparty/amd-rocm271 to buster-wikimedia and update it with ROCm 2.7.1 packages [production]
08:07 <elukey> execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm27 (not used anymore) [production]
2019-09-10 §
16:35 <elukey> reboot analytics-tool1001 via ganeti gnt - not reachable via ssh [production]
13:34 <elukey> reboot stat1005 to clear incosistent process state after tensorflow tests [production]
09:56 <elukey> restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck) [production]
2019-09-09 §
12:29 <elukey> restart archiva again to debug download artifact issue [production]
09:02 <elukey> restart archiva on archiva1001 - stuck and not serving requests (no trace about why in the logs) [production]
2019-08-22 §
17:14 <elukey> remove analytics-tool1002 from ganeti - T231021 [production]
17:12 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
17:12 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
13:47 <elukey> update puppet compiler's facts [production]
12:41 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:40 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
12:39 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:39 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
12:15 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
12:13 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2019-08-21 §
15:16 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32 (duration: 00m 25s) [production]
15:15 <elukey@deploy1001> Started deploy [analytics/superset/deploy@UNKNOWN]: Rollback to 0.32 [production]
14:46 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1 (duration: 00m 33s) [production]
14:46 <elukey@deploy1001> Started deploy [analytics/superset/deploy@868635a]: Upgrading superset to 0.34rc1 [production]
14:28 <elukey> swap turnilo backend in varnish from analytics-tool1002 to an-tool1007 [production]
11:37 <elukey> restart celery-ores-worker on ores1002 [production]
2019-08-19 §
11:02 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
10:53 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
10:53 <elukey@cumin1001> END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) [production]
10:52 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
10:32 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
10:22 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
05:29 <elukey> reboot cp2004 due to bnx2x crash (kern.log saved into my home on the host if needed) [production]
2019-08-16 §
16:12 <elukey> upload prometheus-druid-exporter 0.7-1 to stretch/buster-wikimedia [production]
15:42 <elukey> roll restart of druid broker/historicals to pick up new logging/metrics settings [production]
2019-08-12 §
08:22 <elukey> restart Analytics hadoop HDFS namenodes to pick up new heap settings [production]
2019-08-09 §
18:14 <elukey> add BGP peer for AS 38758 on cr1-eqsin [production]
17:23 <elukey> set BGP peer "BrightRidge" on cr2-eqiad [production]