6251-6300 of 10000 results (32ms)
2015-06-16 §
05:46 <manybubbles> I expect them to be red for another few minutes during the initial master recovery [production]
05:46 <manybubbles> started all elasticsearch nodes and now they are recovering. [production]
05:41 <godog> restart gmond on elastic1007 [production]
05:39 <filippo> Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s) [production]
05:25 <manybubbles> shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time............... [production]
05:11 <godog> restart elasticsearch on elastic1031 [production]
03:06 <springle> Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s) [production]
02:27 <LocalisationUpdate> completed (1.26wmf9) at 2015-06-16 02:27:51+00:00 [production]
02:24 <l10nupdate> Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s) [production]
00:55 <tgr> running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460 [production]
00:52 <springle> Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s) [production]
00:46 <godog> killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris) [production]
00:27 <godog> kill python stats on cp1052, filling /tmp [production]
2015-06-15 §
23:42 <ori> Cleaning up renamed jobqueue metrics on graphite{1,2}001 [production]
23:01 <godog> killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris) [production]
22:54 <hoo> Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s) [production]
22:18 <godog> run disk stress-test on restbase1007 / restbase1009 [production]
22:06 <twentyafterfour> Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s) [production]
22:05 <twentyafterfour> Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s) [production]
20:37 <yurik> Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s) [production]
20:30 <godog> bounce cassandra on restbase1003 [production]
20:18 <godog> start cassandra on restbase1008, bootstrapping [production]
20:04 <godog> sign restbase1008 key, run puppet [production]
20:00 <godog> powercycle restbase1007, investigate disk issue [production]
19:07 <ori> Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s) [production]
16:57 <thcipriani> Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s) [production]
16:49 <thcipriani> Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s) [production]
16:48 <godog> powercycle graphite1002, no ssh, unresponsive console [production]
16:19 <jynus> upgrading es1005 mysql service while depooled [production]
16:12 <thcipriani> Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s) [production]
16:10 <bblack> pybal restarts complete, all ok [production]
16:09 <thcipriani> Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s) [production]
15:47 <thcipriani> Started scap: SWAT: Openstack manager and language updates [production]
15:46 <bblack> starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness [production]
15:11 <bblack> rebooting cp3041 (downtimed) [production]
15:00 <_joe_> ES is green [production]
14:38 <aude> Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s) [production]
14:27 <aude> Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s) [production]
13:47 <jynus> enabling puppet on all elastic* nodes, should enable also ganglia [production]
13:11 <demon> Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s) [production]
13:04 <_joe_> re-scaling down the recovery index bandwidth in ES to 20 mb/s [production]
12:52 <demon> Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s) [production]
11:54 <_joe_> raised the ES index replica bandwidth limit to 60mb [production]
11:31 <akosiaris> migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet [production]
11:15 <_joe_> raised the max bytes for ES recovery to 40mbps [production]
10:49 <manybubbles> and we're yellow right now. [production]
10:49 <manybubbles> the initial primaries stage - the red stage of the rolling restart - recovers quick-ish [production]
10:48 <manybubbles> soon we should see it go yellow and stay that way while the replicas recover [production]
10:48 <manybubbles> manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery [production]
10:46 <jynus> disabled puppet on all elasticsearch nodes to avoid restarting services and other magic [production]