production SAL

3001-3050 of 10000 results (55ms)

2019-06-19 §
10:33	<akosiaris@deploy1001>	scap-helm termbox finished	[production]
10:33	<akosiaris@deploy1001>	scap-helm termbox cluster staging completed	[production]
10:33	<akosiaris@deploy1001>	scap-helm termbox upgrade -f termbox-staging-values.yaml staging stable/termbox [namespace: termbox, clusters: staging]	[production]
10:30	<jbond42>	update late-install so it installs the correct puppet version https://gerrit.wikimedia.org/r/c/operations/puppet/+/515087	[production]
10:30	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
10:30	<moritzm>	installing glibc and ca-certificates-java updates from stretch point release	[production]
10:29	<akosiaris@deploy1001>	scap-helm termbox finished	[production]
10:29	<akosiaris@deploy1001>	scap-helm termbox cluster eqiad completed	[production]
10:29	<akosiaris@deploy1001>	scap-helm termbox upgrade -f termbox-values.yaml production stable/termbox [namespace: termbox, clusters: eqiad]	[production]
10:27	<ema@cumin1001>	END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)	[production]
10:23	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
10:21	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
10:05	<ema>	cp3030: increase varnish-be thread_pool_max from 12000 (250 * 48) to 14400 (300 * 48) to observe impact on fetcherrors	[production]
10:03	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
10:02	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Fully repool db1077 (duration: 00m 55s)	[production]
10:01	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
09:56	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
09:54	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
09:49	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)	[production]
09:36	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
09:34	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
09:34	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: More traffic to db1077 (duration: 00m 55s)	[production]
09:29	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
09:25	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
09:24	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Slowly repool db1077 T225981 (duration: 01m 00s)	[production]
09:19	<XioNoX>	jnt push to esams, remove old protect-old-lvs-servers term + update syslog target T224128	[production]
09:14	<marostegui>	Start MySQL on db1077 - s3 labsdb lag should start catching up T225981	[production]
09:13	<akosiaris@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=kubernetes2001.*	[production]
09:09	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
09:06	<akosiaris>	repool kubernetes2002, kubernetes2003. Point proven, chasing down lead	[production]
09:06	<akosiaris>	repool kubernetes2002, kubernetes2003. Point proven, chasing down load	[production]
09:06	<akosiaris@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=kubernetes2002.*	[production]
09:06	<akosiaris@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=kubernetes2003.*	[production]
09:05	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
09:03	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
08:57	<akosiaris>	depool kubernetes200{2,3} for the same out discards investigation	[production]
08:56	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
08:56	<akosiaris@puppetmaster1001>	conftool action : set/pooled=no; selector: name=kubernetes2003.*	[production]
08:56	<akosiaris@puppetmaster1001>	conftool action : set/pooled=no; selector: name=kubernetes2002.*	[production]
08:54	<akosiaris>	uncordon kubernetes2001, reschedule some pods on it. Investigating out discards still	[production]
08:51	<XioNoX>	jnt push to codfw, remove old protect-old-lvs-servers term + update syslog target T224128	[production]
08:43	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
08:43	<akosiaris>	depool kubernetes2001 from all services to investigate some IP out discard statistics	[production]
08:42	<akosiaris@puppetmaster1001>	conftool action : set/pooled=no; selector: name=kubernetes2001.*	[production]
08:36	<ema@cumin1001>	END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)	[production]
08:36	<akosiaris>	cordon kubernetes2001 to investigate some IP out discard statistics	[production]
08:34	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
08:28	<ema@cumin1001>	START - Cookbook sre.hosts.upgrade-and-reboot	[production]
08:24	<moritzm>	installing new kernels with SACK fix on jessie servers	[production]
08:21	<akosiaris>	upgrade citoid, mathoid, termbox to latest chart releases to address the GC metric naming issue T220709 T222795	[production]