production SAL

951-1000 of 10000 results (28ms)

2015-06-16 §
05:56	<godog>	bump ES replication throttling to 60mb/s	[production]
05:50	<manybubbles>	ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.	[production]
05:49	<manybubbles>	reenabling puppet agent on elasticsearch machines	[production]
05:46	<manybubbles>	I expect them to be red for another few minutes during the initial master recovery	[production]
05:46	<manybubbles>	started all elasticsearch nodes and now they are recovering.	[production]
05:41	<godog>	restart gmond on elastic1007	[production]
05:39	<filippo>	Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)	[production]
05:25	<manybubbles>	shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............	[production]
05:11	<godog>	restart elasticsearch on elastic1031	[production]
03:06	<springle>	Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)	[production]
02:27	<LocalisationUpdate>	completed (1.26wmf9) at 2015-06-16 02:27:51+00:00	[production]
02:24	<l10nupdate>	Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)	[production]
00:55	<tgr>	running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460	[production]
00:52	<springle>	Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s)	[production]
00:46	<godog>	killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
00:27	<godog>	kill python stats on cp1052, filling /tmp	[production]
2015-06-15 §
23:42	<ori>	Cleaning up renamed jobqueue metrics on graphite{1,2}001	[production]
23:01	<godog>	killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
22:54	<hoo>	Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)	[production]
22:18	<godog>	run disk stress-test on restbase1007 / restbase1009	[production]
22:06	<twentyafterfour>	Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)	[production]
22:05	<twentyafterfour>	Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)	[production]
20:37	<yurik>	Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)	[production]
20:30	<godog>	bounce cassandra on restbase1003	[production]
20:18	<godog>	start cassandra on restbase1008, bootstrapping	[production]
20:04	<godog>	sign restbase1008 key, run puppet	[production]
20:00	<godog>	powercycle restbase1007, investigate disk issue	[production]
19:07	<ori>	Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)	[production]
16:57	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)	[production]
16:49	<thcipriani>	Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)	[production]
16:48	<godog>	powercycle graphite1002, no ssh, unresponsive console	[production]
16:19	<jynus>	upgrading es1005 mysql service while depooled	[production]
16:12	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)	[production]
16:10	<bblack>	pybal restarts complete, all ok	[production]
16:09	<thcipriani>	Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)	[production]
15:47	<thcipriani>	Started scap: SWAT: Openstack manager and language updates	[production]
15:46	<bblack>	starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness	[production]
15:11	<bblack>	rebooting cp3041 (downtimed)	[production]
15:00	<_joe_>	ES is green	[production]
14:38	<aude>	Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)	[production]
14:27	<aude>	Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)	[production]
13:47	<jynus>	enabling puppet on all elastic* nodes, should enable also ganglia	[production]
13:11	<demon>	Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s)	[production]
13:04	<_joe_>	re-scaling down the recovery index bandwidth in ES to 20 mb/s	[production]
12:52	<demon>	Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s)	[production]
11:54	<_joe_>	raised the ES index replica bandwidth limit to 60mb	[production]
11:31	<akosiaris>	migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet	[production]
11:15	<_joe_>	raised the max bytes for ES recovery to 40mbps	[production]
10:49	<manybubbles>	and we're yellow right now.	[production]
10:49	<manybubbles>	the initial primaries stage - the red stage of the rolling restart - recovers quick-ish	[production]