3001-3050 of 10000 results (66ms)
2019-06-19 ยง
10:33 <akosiaris@deploy1001> scap-helm termbox finished [production]
10:33 <akosiaris@deploy1001> scap-helm termbox cluster staging completed [production]
10:33 <akosiaris@deploy1001> scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging] [production]
10:30 <jbond42> update late-install so it installs the correct puppet version https://gerrit.wikimedia.org/r/c/operations/puppet/+/515087 [production]
10:30 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
10:30 <moritzm> installing glibc and ca-certificates-java updates from stretch point release [production]
10:29 <akosiaris@deploy1001> scap-helm termbox finished [production]
10:29 <akosiaris@deploy1001> scap-helm termbox cluster eqiad completed [production]
10:29 <akosiaris@deploy1001> scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad] [production]
10:27 <ema@cumin1001> END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) [production]
10:23 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
10:21 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
10:05 <ema> cp3030: increase varnish-be thread_pool_max from 12000 (250 * 48) to 14400 (300 * 48) to observe impact on fetcherrors [production]
10:03 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
10:02 <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Fully repool db1077 (duration: 00m 55s) [production]
10:01 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
09:56 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
09:54 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
09:49 <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s) [production]
09:36 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
09:34 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
09:34 <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s) [production]
09:29 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
09:25 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
09:24 <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 T225981 (duration: 01m 00s) [production]
09:19 <XioNoX> jnt push to esams, remove old protect-old-lvs-servers term + update syslog target T224128 [production]
09:14 <marostegui> Start MySQL on db1077 - s3 labsdb lag should start catching up T225981 [production]
09:13 <akosiaris@puppetmaster1001> conftool action : set/pooled=yes; selector: name=kubernetes2001.* [production]
09:09 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
09:06 <akosiaris> repool kubernetes2002, kubernetes2003. Point proven, chasing down lead [production]
09:06 <akosiaris> repool kubernetes2002, kubernetes2003. Point proven, chasing down load [production]
09:06 <akosiaris@puppetmaster1001> conftool action : set/pooled=yes; selector: name=kubernetes2002.* [production]
09:06 <akosiaris@puppetmaster1001> conftool action : set/pooled=yes; selector: name=kubernetes2003.* [production]
09:05 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
09:03 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
08:57 <akosiaris> depool kubernetes200{2,3} for the same out discards investigation [production]
08:56 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
08:56 <akosiaris@puppetmaster1001> conftool action : set/pooled=no; selector: name=kubernetes2003.* [production]
08:56 <akosiaris@puppetmaster1001> conftool action : set/pooled=no; selector: name=kubernetes2002.* [production]
08:54 <akosiaris> uncordon kubernetes2001, reschedule some pods on it. Investigating out discards still [production]
08:51 <XioNoX> jnt push to codfw, remove old protect-old-lvs-servers term + update syslog target T224128 [production]
08:43 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
08:43 <akosiaris> depool kubernetes2001 from all services to investigate some IP out discard statistics [production]
08:42 <akosiaris@puppetmaster1001> conftool action : set/pooled=no; selector: name=kubernetes2001.* [production]
08:36 <ema@cumin1001> END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [production]
08:36 <akosiaris> cordon kubernetes2001 to investigate some IP out discard statistics [production]
08:34 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
08:28 <ema@cumin1001> START - Cookbook sre.hosts.upgrade-and-reboot [production]
08:24 <moritzm> installing new kernels with SACK fix on jessie servers [production]
08:21 <akosiaris> upgrade citoid, mathoid, termbox to latest chart releases to address the GC metric naming issue T220709 T222795 [production]