| 2020-11-20
      
      § | 
    
  | 23:38 | <mutante> | synced puppet-compiler facts - new hosts should be usable in compiler | [production] | 
            
  | 22:30 | <mutante> | cumin1001 - sudo systemctl start cumin-check-aliases ->   <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK  T268369 | [production] | 
            
  | 21:30 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 20:26 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 20:09 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 19:52 | <mutante> | releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts | [production] | 
            
  | 19:45 | <mutante> | releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) | [production] | 
            
  | 19:39 | <mutante> | Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise.  from 72 to 25 active alerts | [production] | 
            
  | 19:14 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 18:47 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:42 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:37 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:36 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 18:31 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:31 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:18 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:14 | <dwisehaupt> | shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259 | [production] | 
            
  | 17:37 | <pt1979@cumin2001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 17:35 | <pt1979@cumin2001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 17:32 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 17:24 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 16:48 | <volans@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 16:40 | <volans@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 16:29 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 16:29 | <razzi@cumin1001> | END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) | [production] | 
            
  | 16:28 | <razzi> | removed canceled ip address records for kafka-test1002 from netbox | [production] | 
            
  | 16:11 | <pt1979@cumin2001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 16:09 | <pt1979@cumin2001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 16:01 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 16:01 | <razzi@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 15:42 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 15:09 | <andrew@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 15:01 | <andrew@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 14:59 | <andrew@cumin1001> | END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) | [production] | 
            
  | 14:58 | <andrew@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 14:30 | <elukey> | force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings | [production] | 
            
  | 14:28 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) | [production] | 
            
  | 14:00 | <elukey> | restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades | [production] | 
            
  | 13:59 | <elukey@cumin1001> | START - Cookbook sre.hadoop.roll-restart-masters | [production] | 
            
  | 13:34 | <liw> | finished trying to test scap on beta cluster | [production] | 
            
  | 13:24 | <bblack> | cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs | [production] | 
            
  | 13:12 | <liw> | testing upcoming Scap release on beta | [production] |