production SAL

1-50 of 10000 results (22ms)

2016-04-30 §
13:41	<elukey>	disabled puppet on analytics1047 and scheduled downtime for the host, IO errors in the dmesg for /dev/sdd. Stopped also Hadoop daemons to remove it from the cluster temporarily (not sure how to do it properly, will write docs).	[production]
10:45	<volans>	Reset slave on sanitarium:3311 due to corrupted relay log after skipping query for duplicate key T132416	[production]
10:19	<volans>	restarted slave on dbstore1001 skipping missing database T132837	[production]
08:28	<gehel>	restarting elasticsearch server elastic1031.eqiad.wmnet (T110236)	[production]
07:15	<gehel>	restarting elasticsearch server elastic1030.eqiad.wmnet (T110236)	[production]
06:32	<gehel>	restarting elasticsearch server elastic1029.eqiad.wmnet (T110236)	[production]
06:16	<gehel>	restarting elasticsearch server elastic1028.eqiad.wmnet (T110236)	[production]
01:15	<aude>	applied Ibd302e1 to terbium for debugging broken wikidata rdf dumps	[production]
2016-04-29 §
22:57	<mutante>	DNS - forced authdns-gen-zones etc from https://phabricator.wikimedia.org/T97051#1994679 on ns0/ns1/ns2 to get new language added	[production]
20:59	<gehel>	restarting elasticsearch server elastic1027.eqiad.wmnet (T110236)	[production]
19:56	<urandom>	(Re)starting cleanup on restbase1009-{a,b}.eqiad.wmnet	[production]
19:56	<catrope@tin>	Synchronized php-1.27.0-wmf.22/extensions/CentralNotice/: T133971 (duration: 00m 41s)	[production]
19:29	<gehel>	restarting elasticsearch server elastic1026.eqiad.wmnet (T110236)	[production]
19:07	<gehel>	restarting elasticsearch server elastic1025.eqiad.wmnet (T110236)	[production]
18:21	<jzerebecki@tin>	Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 28s)	[production]
18:20	<jzerebecki@tin>	Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 29s)	[production]
18:14	<jzerebecki@tin>	Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Hooks/OutputPageBeforeHTMLHookHandler.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 2 of 2 T132645 (duration: 00m 34s)	[production]
18:14	<robh>	started all slaves via dbstore2001 this time.	[production]
18:12	<jzerebecki@tin>	Synchronized php-1.27.0-wmf.22/extensions/Wikidata/extensions/Wikibase/repo/includes/Dumpers/DumpGenerator.php: wmf.22 fc20c54f7915b94ec0d15ef17e207c116910623d 1 of 2 T133924 (duration: 00m 44s)	[production]
18:07	<robh>	started all slaves via dbstore2002 per jaime's request	[production]
17:45	<gehel>	restarting elasticsearch server elastic1024.eqiad.wmnet (T110236)	[production]
16:56	<gehel>	restarting elasticsearch server elastic1023.eqiad.wmnet (T110236)	[production]
16:22	<gehel>	restarting elasticsearch server elastic1022.eqiad.wmnet (T110236)	[production]
15:29	<jynus@tin>	Synchronized wmf-config/db-codfw.php: Repool db2047 and db2068. Depool db2008, db2009. Pool db2033 as the new x1 node. (duration: 00m 27s)	[production]
15:17	<gehel>	restarting elasticsearch server elastic1021.eqiad.wmnet (T110236)	[production]
14:56	<oblivian@palladium>	conftool action : set/pooled=yes; selector: name=mw1153.eqiad.wmnet	[production]
14:54	<jynus>	moving topology of db2033 to be the new x1 master on codfw	[production]
14:40	<oblivian@palladium>	conftool action : set/pooled=no; selector: name=mw1153.eqiad.wmnet	[production]
14:32	<gehel>	restarting elasticsearch server elastic1020.eqiad.wmnet (T110236)	[production]
14:26	<hashar>	Rebased tin:/srv/mediawiki-staging 31886c7..8e2670a . Bring in 3 changes that are solely for beta cluster.	[production]
13:54	<jynus>	stopping mysql db2008 (cloning to db2033)	[production]
13:39	<jynus>	reimaging db2033	[production]
13:09	<gehel>	restarting elasticsearch server elastic1019.eqiad.wmnet (T110236)	[production]
12:30	<gehel>	restarting elasticsearch server elastic1018.eqiad.wmnet (T110236)	[production]
11:39	<elukey>	soft reboot for mw1119 (not responsive to ssh, root login timed out on the console)	[production]
09:43	<gehel>	restarting elasticsearch server elastic1017.eqiad.wmnet (T110236)	[production]
09:42	<gehel>	restarting elasticsearch server elastic1016.eqiad.wmnet (T110236)	[production]
09:01	<jynus>	changing live configuration of db1049 thread_pool_stall_limit to 10 to test impact on connection timout	[production]
08:20	<gehel>	restarting elasticsearch server elastic1016.eqiad.wmnet (T110236)	[production]
07:57	<elukey>	puppet disabled on new kafka codfw instances due to errors while starting Event Bus (hosts not in service)	[production]
07:54	<moritzm>	enabled base::firewall on stat1002	[production]
07:52	<gehel>	restarting elasticsearch server elastic1015.eqiad.wmnet (T110236)	[production]
07:36	<godog>	stop cleanups on restbase1014-b	[production]
06:46	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Reduce normal traffic on s2 API servers (duration: 00m 27s)	[production]
06:33	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1038, increase weight of new hardware slaves db107[4-8] (duration: 00m 33s)	[production]
05:42	<gehel>	restarting elasticsearch server elastic1014.eqiad.wmnet (T110236)	[production]
05:41	<mutante>	re: "02:29 Krenair: last deployment was slow because of snapshot1007 being offline" it's back, i don't know why, it was powered down and i just tried switching it on. that helped. the command is literally "power on" on HP	[production]
05:39	<mutante>	snapshot1007 - was powered down, powering it on. (..connect to mgmt.. "damn it's a HP")	[production]
05:34	<mutante>	snapshot1007 - not reachable, duration 10h	[production]
04:58	<gehel>	restarting elasticsearch server elastic1013.eqiad.wmnet (T110236)	[production]