| 2020-11-20
      
      ยง | 
    
  | 19:52 | <mutante> | releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts | [production] | 
            
  | 19:45 | <mutante> | releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) | [production] | 
            
  | 19:39 | <mutante> | Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise.  from 72 to 25 active alerts | [production] | 
            
  | 19:14 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 18:47 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:42 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:37 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:36 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 18:31 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:31 | <elukey@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 18:18 | <elukey@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 18:14 | <dwisehaupt> | shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259 | [production] | 
            
  | 17:37 | <pt1979@cumin2001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 17:35 | <pt1979@cumin2001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 17:32 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 17:24 | <razzi@cumin1001> | END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) | [production] | 
            
  | 16:48 | <volans@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 16:40 | <volans@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 16:29 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 16:29 | <razzi@cumin1001> | END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) | [production] | 
            
  | 16:28 | <razzi> | removed canceled ip address records for kafka-test1002 from netbox | [production] | 
            
  | 16:11 | <pt1979@cumin2001> | END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) | [production] | 
            
  | 16:09 | <pt1979@cumin2001> | START - Cookbook sre.hosts.downtime | [production] | 
            
  | 16:01 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 16:01 | <razzi@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 15:42 | <razzi@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 15:09 | <andrew@cumin1001> | END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) | [production] | 
            
  | 15:01 | <andrew@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 14:59 | <andrew@cumin1001> | END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) | [production] | 
            
  | 14:58 | <andrew@cumin1001> | START - Cookbook sre.hosts.decommission | [production] | 
            
  | 14:30 | <elukey> | force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings | [production] | 
            
  | 14:28 | <elukey@cumin1001> | END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) | [production] | 
            
  | 14:00 | <elukey> | restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades | [production] | 
            
  | 13:59 | <elukey@cumin1001> | START - Cookbook sre.hadoop.roll-restart-masters | [production] | 
            
  | 13:34 | <liw> | finished trying to test scap on beta cluster | [production] | 
            
  | 13:24 | <bblack> | cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs | [production] | 
            
  | 13:12 | <liw> | testing upcoming Scap release on beta | [production] | 
            
  | 13:00 | <bblack> | dns*: upgrade remainder of fleet to gdnsd to 3.4.1 | [production] | 
            
  | 12:54 | <elukey@cumin1001> | END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) | [production] | 
            
  | 12:29 | <moritzm> | uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop | [production] | 
            
  | 12:16 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json | [production] | 
            
  | 12:14 | <marostegui> | Run check private data on clouddb1013:3311  clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 T267090 | [production] | 
            
  | 12:11 | <Urbanecm> | Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; T246539) | [production] | 
            
  | 11:50 | <marostegui@cumin1001> | dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json | [production] | 
            
  | 11:47 | <marostegui@cumin1001> | dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json | [production] | 
            
  | 11:46 | <marostegui@cumin1001> | dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P13348 and previous config saved to /var/cache/conftool/dbconfig/20201120-114614-marostegui.json | [production] | 
            
  | 11:15 | <volans@cumin2001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 11:11 | <volans@cumin2001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 10:44 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13347 and previous config saved to /var/cache/conftool/dbconfig/20201120-104459-root.json | [production] | 
            
  | 10:29 | <marostegui@cumin1001> | dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repooling after cloning new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13345 and previous config saved to /var/cache/conftool/dbconfig/20201120-102955-root.json | [production] |