| 
      
        2019-06-19
      
      ยง
     | 
  
    
  | 10:47 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 10:38 | 
  <ladsgroup@deploy1001> | 
  scap-helm termbox finished | 
  [production] | 
            
  | 10:38 | 
  <ladsgroup@deploy1001> | 
  scap-helm termbox cluster codfw completed | 
  [production] | 
            
  | 10:38 | 
  <ladsgroup@deploy1001> | 
  scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: codfw] | 
  [production] | 
            
  | 10:36 | 
  <moritzm> | 
  rebooting mx2001 for kernel security update | 
  [production] | 
            
  | 10:35 | 
  <jmm@cumin2001> | 
  END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) | 
  [production] | 
            
  | 10:35 | 
  <jmm@cumin2001> | 
  START - Cookbook sre.hosts.downtime | 
  [production] | 
            
  | 10:33 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox finished | 
  [production] | 
            
  | 10:33 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox cluster staging completed | 
  [production] | 
            
  | 10:33 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging] | 
  [production] | 
            
  | 10:30 | 
  <jbond42> | 
  update late-install so it installs the correct puppet version https://gerrit.wikimedia.org/r/c/operations/puppet/+/515087 | 
  [production] | 
            
  | 10:30 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 10:30 | 
  <moritzm> | 
  installing glibc and ca-certificates-java updates from stretch point release | 
  [production] | 
            
  | 10:29 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox finished | 
  [production] | 
            
  | 10:29 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox cluster eqiad completed | 
  [production] | 
            
  | 10:29 | 
  <akosiaris@deploy1001> | 
  scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad] | 
  [production] | 
            
  | 10:27 | 
  <ema@cumin1001> | 
  END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) | 
  [production] | 
            
  | 10:23 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 10:21 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 10:05 | 
  <ema> | 
  cp3030: increase varnish-be thread_pool_max from 12000 (250 * 48) to 14400 (300 * 48) to observe impact on fetcherrors | 
  [production] | 
            
  | 10:03 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 10:02 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-eqiad.php: Fully repool db1077 (duration: 00m 55s) | 
  [production] | 
            
  | 10:01 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 09:56 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 09:54 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 09:49 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s) | 
  [production] | 
            
  | 09:36 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 09:34 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 09:34 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s) | 
  [production] | 
            
  | 09:29 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 09:25 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 09:24 | 
  <marostegui@deploy1001> | 
  Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 T225981 (duration: 01m 00s) | 
  [production] | 
            
  | 09:19 | 
  <XioNoX> | 
  jnt push to esams, remove old protect-old-lvs-servers term + update syslog target T224128 | 
  [production] | 
            
  | 09:14 | 
  <marostegui> | 
  Start MySQL on db1077 - s3 labsdb lag should start catching up T225981 | 
  [production] | 
            
  | 09:13 | 
  <akosiaris@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=kubernetes2001.* | 
  [production] | 
            
  | 09:09 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 09:06 | 
  <akosiaris> | 
  repool kubernetes2002, kubernetes2003. Point proven, chasing down lead | 
  [production] | 
            
  | 09:06 | 
  <akosiaris> | 
  repool kubernetes2002, kubernetes2003. Point proven, chasing down load | 
  [production] | 
            
  | 09:06 | 
  <akosiaris@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=kubernetes2002.* | 
  [production] | 
            
  | 09:06 | 
  <akosiaris@puppetmaster1001> | 
  conftool action : set/pooled=yes; selector: name=kubernetes2003.* | 
  [production] | 
            
  | 09:05 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 09:03 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 08:57 | 
  <akosiaris> | 
  depool kubernetes200{2,3} for the same out discards investigation | 
  [production] | 
            
  | 08:56 | 
  <ema@cumin1001> | 
  START - Cookbook sre.hosts.upgrade-and-reboot | 
  [production] | 
            
  | 08:56 | 
  <akosiaris@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=kubernetes2003.* | 
  [production] | 
            
  | 08:56 | 
  <akosiaris@puppetmaster1001> | 
  conftool action : set/pooled=no; selector: name=kubernetes2002.* | 
  [production] | 
            
  | 08:54 | 
  <akosiaris> | 
  uncordon kubernetes2001, reschedule some pods on it. Investigating out discards still | 
  [production] | 
            
  | 08:51 | 
  <XioNoX> | 
  jnt push to codfw, remove old protect-old-lvs-servers term + update syslog target T224128 | 
  [production] | 
            
  | 08:43 | 
  <ema@cumin1001> | 
  END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) | 
  [production] | 
            
  | 08:43 | 
  <akosiaris> | 
  depool kubernetes2001 from all services to investigate some IP out discard statistics | 
  [production] |