production SAL

9701-9750 of 10000 results (49ms)

2015-06-16 §
09:10	<YuviPanda>	deleted huge puppet-master.log on labcontrol1001	[production]
08:05	<jynus>	added m5-slave to dns servers	[production]
07:52	<paravoid>	restarting hhvm on mw1121	[production]
07:39	<jynus>	Synchronized wmf-config/db-eqiad.php: Repool es1005 (duration: 00m 14s)	[production]
06:24	<LocalisationUpdate>	ResourceLoader cache refresh completed at Tue Jun 16 06:24:04 UTC 2015 (duration 24m 3s)	[production]
06:18	<godog>	restore ES replication throttling to 20mb/s	[production]
06:13	<godog>	restore ES replication throttling to 40mb/s	[production]
06:08	<filippo>	Synchronized wmf-config/PoolCounterSettings-common.php: unthrottle ES (duration: 00m 14s)	[production]
05:56	<godog>	bump ES replication throttling to 60mb/s	[production]
05:50	<manybubbles>	ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.	[production]
05:49	<manybubbles>	reenabling puppet agent on elasticsearch machines	[production]
05:46	<manybubbles>	I expect them to be red for another few minutes during the initial master recovery	[production]
05:46	<manybubbles>	started all elasticsearch nodes and now they are recovering.	[production]
05:41	<godog>	restart gmond on elastic1007	[production]
05:39	<filippo>	Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)	[production]
05:25	<manybubbles>	shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............	[production]
05:11	<godog>	restart elasticsearch on elastic1031	[production]
03:06	<springle>	Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)	[production]
02:27	<LocalisationUpdate>	completed (1.26wmf9) at 2015-06-16 02:27:51+00:00	[production]
02:24	<l10nupdate>	Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)	[production]
00:55	<tgr>	running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460	[production]
00:52	<springle>	Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s)	[production]
00:46	<godog>	killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
00:27	<godog>	kill python stats on cp1052, filling /tmp	[production]
2015-06-15 §
23:42	<ori>	Cleaning up renamed jobqueue metrics on graphite{1,2}001	[production]
23:01	<godog>	killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)	[production]
22:54	<hoo>	Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)	[production]
22:18	<godog>	run disk stress-test on restbase1007 / restbase1009	[production]
22:06	<twentyafterfour>	Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)	[production]
22:05	<twentyafterfour>	Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)	[production]
20:37	<yurik>	Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)	[production]
20:30	<godog>	bounce cassandra on restbase1003	[production]
20:18	<godog>	start cassandra on restbase1008, bootstrapping	[production]
20:04	<godog>	sign restbase1008 key, run puppet	[production]
20:00	<godog>	powercycle restbase1007, investigate disk issue	[production]
19:07	<ori>	Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)	[production]
16:57	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)	[production]
16:49	<thcipriani>	Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)	[production]
16:48	<godog>	powercycle graphite1002, no ssh, unresponsive console	[production]
16:19	<jynus>	upgrading es1005 mysql service while depooled	[production]
16:12	<thcipriani>	Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)	[production]
16:10	<bblack>	pybal restarts complete, all ok	[production]
16:09	<thcipriani>	Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)	[production]
15:47	<thcipriani>	Started scap: SWAT: Openstack manager and language updates	[production]
15:46	<bblack>	starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness	[production]
15:11	<bblack>	rebooting cp3041 (downtimed)	[production]
15:00	<_joe_>	ES is green	[production]
14:38	<aude>	Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)	[production]
14:27	<aude>	Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)	[production]
13:47	<jynus>	enabling puppet on all elastic* nodes, should enable also ganglia	[production]