production SAL

9751-9800 of 10000 results (52ms)

2015-06-16 §
00:46	<godog>	killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
00:27	<godog>	kill python stats on cp1052, filling /tmp	[production]
2015-06-15 §
23:42	<ori>	Cleaning up renamed jobqueue metrics on graphite{1,2}001	[production]
23:01	<godog>	killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
22:54	<hoo>	Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)	[production]
22:18	<godog>	run disk stress-test on restbase1007 / restbase1009	[production]
22:06	<twentyafterfour>	Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)	[production]
22:05	<twentyafterfour>	Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)	[production]
20:37	<yurik>	Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)	[production]
20:30	<godog>	bounce cassandra on restbase1003	[production]
20:18	<godog>	start cassandra on restbase1008, bootstrapping	[production]
20:04	<godog>	sign restbase1008 key, run puppet	[production]
20:00	<godog>	powercycle restbase1007, investigate disk issue	[production]
19:07	<ori>	Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)	[production]
16:57	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)	[production]
16:49	<thcipriani>	Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)	[production]
16:48	<godog>	powercycle graphite1002, no ssh, unresponsive console	[production]
16:19	<jynus>	upgrading es1005 mysql service while depooled	[production]
16:12	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)	[production]
16:10	<bblack>	pybal restarts complete, all ok	[production]
16:09	<thcipriani>	Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)	[production]
15:47	<thcipriani>	Started scap: SWAT: Openstack manager and language updates	[production]
15:46	<bblack>	starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness	[production]
15:11	<bblack>	rebooting cp3041 (downtimed)	[production]
15:00	<_joe_>	ES is green	[production]
14:38	<aude>	Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)	[production]
14:27	<aude>	Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)	[production]
13:47	<jynus>	enabling puppet on all elastic* nodes, should enable also ganglia	[production]
13:11	<demon>	Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s)	[production]
13:04	<_joe_>	re-scaling down the recovery index bandwidth in ES to 20 mb/s	[production]
12:52	<demon>	Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s)	[production]
11:54	<_joe_>	raised the ES index replica bandwidth limit to 60mb	[production]
11:31	<akosiaris>	migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet	[production]
11:15	<_joe_>	raised the max bytes for ES recovery to 40mbps	[production]
10:49	<manybubbles>	and we're yellow right now.	[production]
10:49	<manybubbles>	the initial primaries stage - the red stage of the rolling restart - recovers quick-ish	[production]
10:48	<manybubbles>	soon we should see it go yellow and stay that way while the replicas recover	[production]
10:48	<manybubbles>	manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery	[production]
10:46	<jynus>	disabled puppet on all elasticsearch nodes to avoid restarting services and other magic	[production]
10:44	<_joe_>	disabled hot threads logging, ganglia on es nodes	[production]
10:44	<manybubbles>	started Elasticsearch on all elasticsearch nodes	[production]
10:38	<manybubbles>	stopping all elasticsearch servers - going for a full cluster resstart.	[production]
10:11	<manybubbles>	restarting elasticsearch on elasticsearch1021 - that one is in a gc death spiral	[production]
09:26	<oblivian>	Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)	[production]
09:12	<oblivian>	Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)	[production]
07:35	<_joe_>	attempting a fast restart of elastic1020	[production]
07:21	<ori>	Synchronized php-1.26wmf9/extensions/CirrusSearch/includes/Util.php: I504dac0c3: Add missing 'use \\Status;' to includes/Util.php (duration: 00m 13s)	[production]
04:56	<LocalisationUpdate>	ResourceLoader cache refresh completed at Mon Jun 15 04:56:39 UTC 2015 (duration 56m 38s)	[production]
03:31	<springle>	Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 00m 12s)	[production]
02:23	<LocalisationUpdate>	completed (1.26wmf9) at 2015-06-15 02:22:56+00:00	[production]