production SAL

1401-1450 of 10000 results (62ms)

2019-04-15 §
13:55	<godog>	start ms-be1013 decom - T220590	[production]
13:42	<godog>	reboot ms-be1013	[production]
13:09	<moritzm>	installing wget security updates on trusty hosts	[production]
12:59	<moritzm>	restarting archiva on archiva1001 for OpenJDK security update	[production]
12:50	<moritzm>	restarting Apache on matomo1001 to pick up OpenSSL update	[production]
12:14	<moritzm>	rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update	[production]
11:59	<fsero>	pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)	[production]
11:35	<Amir1>	EU swat is done	[production]
11:26	<moritzm>	rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update	[production]
09:58	<moritzm>	installing openssl1.0 security updates	[production]
09:18	<gehel>	unbanning elastic1029 from cluster	[production]
08:58	<moritzm>	updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff	[production]
08:29	<onimisionipe>	increase wal_keep_segments on codfw maps master	[production]
08:19	<moritzm>	updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff	[production]
07:50	<Amir1>	ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)	[production]
05:31	<marostegui>	Upgrade db1100	[production]
05:07	<marostegui>	powercycle mw1280 (crashed)	[production]
2019-04-14 §
06:10	<ebernhardson>	unban elastic1027 from eqiad-psi	[production]
05:36	<ebernhardson>	unbanning elastic1027 after about half the shards left and load dropped	[production]
05:31	<ebernhardson>	ban elastic1027 from elasticsearch-psi in eqiad	[production]
04:59	<ebernhardson>	restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes	[production]
2019-04-13 §
18:46	<godog>	3h downtime for cloudvirt1015	[production]
15:58	<ebernhardson>	restart elasticsearch on elastic1027	[production]
15:34	<shdubsh>	restart recommendation_api on scb1001	[production]
15:33	<shdubsh>	restart recommendation_api on scb2001	[production]
10:46	<onimisionipe>	depooling maps2001 for postgres init	[production]
08:05	<gehel>	repooling wdqs1008 - data transfer completed - T220830	[production]
00:32	<krinkle@deploy1001>	Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)	[production]
2019-04-12 §
21:16	<Krinkle>	scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)	[production]
21:10	<krinkle@deploy1001>	Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)	[production]
19:27	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)	[production]
19:27	<dzahn@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
19:24	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)	[production]
19:24	<dzahn@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
18:59	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
18:59	<dzahn@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
17:17	<onimisionipe>	depooling maps2002 for postgres init	[production]
17:16	<onimisionipe>	repooling maps2001 - postgres init is complete	[production]
16:14	<elukey>	install ifstat on all the mc1* hosts for network bandwidth investigation	[production]
15:56	<gehel>	starting data trasnfer from wdqs1008 to wdqs1009 - T220830	[production]
15:32	<thcipriani>	gerrit back	[production]
15:29	<thcipriani>	gerrit restart incoming	[production]
14:29	<onimisionipe>	depool maps2001 for postgres initialization	[production]
13:24	<akosiaris>	re-enable puppet across the fleet. Patch merged, recovery storm coming	[production]
13:18	<akosiaris>	disable puppet across the fleet to avoid incoming puppet alert storm	[production]
12:57	<marostegui>	Purge old rows and optimize tables on spare host pc1010 T210725	[production]
12:53	<urandom>	decommissioning cassandra-c, restbase2008 -- T208087	[production]
12:49	<gehel>	rolling restart of cassandra on maps* for jvm upgrade	[production]
12:22	<arturo>	T220095 disable icinga checks for labtestcontrol2003	[production]
12:16	<gilles@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: T220807 Reduce cawiki survey sampling rate (duration: 05m 11s)	[production]