production SAL

1101-1150 of 10000 results (38ms)

2017-04-19 §
06:52	<_joe_>	artificially stopping slave replication on rdb2001 for a final test of the switchover redis stage	[production]
03:53	<urandom>	T163292: Starting removal of Cassandra instance restbase1018-b.eqiad.wmnet	[production]
03:49	<mobrovac@tin>	Started restart [restbase/deploy@1bfada4]: (no justification provided)	[production]
03:40	<mobrovac@tin>	Started restart [restbase/deploy@1bfada4]: Kick RB to pick up restbase1018 instances are gone	[production]
03:32	<mobrovac@tin>	Finished deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292 (duration: 00m 53s)	[production]
03:31	<mobrovac@tin>	Started deploy [changeprop/deploy@a19ebf8]: Temp: Decrease the transclusion update from 400 to 200 for T163292	[production]
01:58	<mutante>	naos: rsyncd is of course legitimately running on a deployment server sepearate from this (unlike in other cases where we used it for syncing during migration), so this was just the one config fragment for /home and not removing the service or anything	[production]
01:56	<mutante>	naos: manually deleting rsyncd config remnants (puppet wouldn't know to clean up after itself)	[production]
01:47	<mutante>	rsyncing /home from mira to naos (T162900)	[production]
01:21	<urandom>	T163292: Starting removal of Cassandra instance restbase1018-a.eqiad.wmnet	[production]
2017-04-18 §
23:04	<dzahn@puppetmaster1001>	conftool action : set/pooled=no; selector: name=restbase1018.eqiad.wmnet	[production]
23:02	<mutante>	ms1001 - deleting old GlobalCert SSL cert for dumps.wm that was about to expire and is replaced by Letsencrypt,	[production]
22:30	<mutante>	ocg1003 gzipping ocg.log for disk space	[production]
21:12	<bblack@neodymium>	conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-be	[production]
20:36	<bblack@neodymium>	conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-be	[production]
17:26	<mobrovac@tin>	Finished deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons (duration: 07m 12s)	[production]
17:26	<ssastry@tin>	Finished deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m) (duration: 06m 25s)	[production]
17:19	<ssastry@tin>	Started deploy [parsoid/deploy@b067328]: Deploying Parsoid to bump heap limits to 900m (from 600m)	[production]
17:19	<mobrovac@tin>	Started deploy [restbase/deploy@1bfada4]: Blacklist all user pages on commons	[production]
17:12	<XenoRyet>	updated tools from a8b8d7242799b61dd2a48ef4e804164cd1818bc9 to a1e9342e093a85032255fc1d9904db7df13680b7	[production]
17:09	<elukey>	restart nutcracker in codfw (profile::mediawiki::nutcracker) to make sure that all the daemons are running with the latest config	[production]
16:26	<bblack>	completed Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )	[production]
16:21	<bblack>	starting Traffic-layer portions of codfw switchover ( https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchover_2 )	[production]
16:15	<jynus>	reimporting some rows to dbstore1002 on jawiki and ruwiki T160509	[production]
16:12	<godog>	reboot tin to fix cpu mhz issue and check bios settings - T163158	[production]
16:09	<mobrovac@tin>	Finished deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page (duration: 08m 16s)	[production]
16:01	<mobrovac@tin>	Started deploy [restbase/deploy@960b468]: Blacklist an enwiki and a commons page	[production]
16:00	<mobrovac@tin>	Finished deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page (duration: 01m 42s)	[production]
15:58	<mobrovac@tin>	Started deploy [restbase/deploy@960b468]: Dev Cluster: Blacklist an enwiki and a commons page	[production]
15:20	<elukey>	restored default output-buffer config for rdb2005:6479	[production]
15:08	<godog>	puppet-run on cache_upload in codfw/eqiad to pick up swift a/p changes	[production]
15:02	<godog>	puppet-run on cache_upload in codfw/eqiad to pick up switch a/a changes	[production]
15:02	<gehel>	upgrading elastic2020 to elasticsearch 5.1.2	[production]
14:55	<_joe_>	switchover of services, misc things done	[production]
14:54	<oblivian:>	Setting restbase-async in codfw DOWN	[production]
14:54	<oblivian:>	Setting restbase-async in eqiad UP	[production]
14:43	<_joe_>	switching traffic for all a/a services plus maps and restbase to codfw-only	[production]
14:38	<_joe_>	forcing puppet run on caches for catching up with the a/a setting of maps and restbase	[production]
14:33	<oblivian:>	Setting restbase in eqiad DOWN	[production]
14:33	<_joe_>	starting switchover of services eqiad => codfw; external traffic will be switched over, as well as internal traffic to restbase	[production]
14:25	<gehel>	un-ban elastic2020 to get ready for real-life test during switchover - T149006	[production]
14:22	<elukey>	executed config set client-output-buffer-limit "normal 0 0 0 slave 2147483648 2147483648 300 pubsub 33554432 8388608 60" on rdb2005:6749 as attempt to solve slave lagging - T159850	[production]
14:21	<oblivian:>	Setting mobileapps in eqiad UP	[production]
14:14	<oblivian:>	Setting mobileapps in eqiad DOWN	[production]
14:11	<elukey>	executed CONFIG SET appendfsync everysec (default) to restore defaults on rdb2005:6479- T159850	[production]
14:08	<switchdc>	(oblivian@sarin) END TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Successfully completed	[production]
14:04	<elukey>	executed CONFIG SET appendfsync no on rdb2005:6479 to test if fsync stalls affect replication - T159850	[production]
13:50	<switchdc>	(oblivian@sarin) START TASK - switchdc.stages.t09_restart_parsoid(codfw, eqiad) Rolling restart parsoid in eqiad and codfw	[production]
13:35	<switchdc>	(oblivian@sarin) END TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Failed to execute	[production]
13:35	<switchdc>	(oblivian@sarin) START TASK - switchdc.stages.t01_stop_maintenance(codfw, eqiad) Stop MediaWiki maintenance in the old master DC	[production]