2015-06-15
§
|
16:48 |
<godog> |
powercycle graphite1002, no ssh, unresponsive console |
[production] |
16:19 |
<jynus> |
upgrading es1005 mysql service while depooled |
[production] |
16:12 |
<thcipriani> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s) |
[production] |
16:10 |
<bblack> |
pybal restarts complete, all ok |
[production] |
16:09 |
<thcipriani> |
Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s) |
[production] |
15:47 |
<thcipriani> |
Started scap: SWAT: Openstack manager and language updates |
[production] |
15:46 |
<bblack> |
starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness |
[production] |
15:11 |
<bblack> |
rebooting cp3041 (downtimed) |
[production] |
15:00 |
<_joe_> |
ES is green |
[production] |
14:38 |
<aude> |
Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s) |
[production] |
14:27 |
<aude> |
Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s) |
[production] |
13:47 |
<jynus> |
enabling puppet on all elastic* nodes, should enable also ganglia |
[production] |
13:11 |
<demon> |
Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s) |
[production] |
13:04 |
<_joe_> |
re-scaling down the recovery index bandwidth in ES to 20 mb/s |
[production] |
12:52 |
<demon> |
Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s) |
[production] |
11:54 |
<_joe_> |
raised the ES index replica bandwidth limit to 60mb |
[production] |
11:31 |
<akosiaris> |
migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet |
[production] |
11:15 |
<_joe_> |
raised the max bytes for ES recovery to 40mbps |
[production] |
10:49 |
<manybubbles> |
and we're yellow right now. |
[production] |
10:49 |
<manybubbles> |
the initial primaries stage - the red stage of the rolling restart - recovers quick-ish |
[production] |
10:48 |
<manybubbles> |
soon we should see it go yellow and stay that way while the replicas recover |
[production] |
10:48 |
<manybubbles> |
manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery |
[production] |
10:46 |
<jynus> |
disabled puppet on all elasticsearch nodes to avoid restarting services and other magic |
[production] |
10:44 |
<_joe_> |
disabled hot threads logging, ganglia on es nodes |
[production] |
10:44 |
<manybubbles> |
started Elasticsearch on all elasticsearch nodes |
[production] |
10:38 |
<manybubbles> |
stopping all elasticsearch servers - going for a full cluster resstart. |
[production] |
10:11 |
<manybubbles> |
restarting elasticsearch on elasticsearch1021 - that one is in a gc death spiral |
[production] |
09:26 |
<oblivian> |
Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s) |
[production] |
09:12 |
<oblivian> |
Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s) |
[production] |
07:35 |
<_joe_> |
attempting a fast restart of elastic1020 |
[production] |
07:21 |
<ori> |
Synchronized php-1.26wmf9/extensions/CirrusSearch/includes/Util.php: I504dac0c3: Add missing 'use \\Status;' to includes/Util.php (duration: 00m 13s) |
[production] |
04:56 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Mon Jun 15 04:56:39 UTC 2015 (duration 56m 38s) |
[production] |
03:31 |
<springle> |
Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 00m 12s) |
[production] |
02:23 |
<LocalisationUpdate> |
completed (1.26wmf9) at 2015-06-15 02:22:56+00:00 |
[production] |
02:19 |
<l10nupdate> |
Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 46s) |
[production] |
2015-06-13
§
|
19:30 |
<bblack> |
repooled cp1071, cp3040 |
[production] |
18:53 |
<bblack> |
rebooting cp1071, cp3040 to look at BIOS-level things (depooled, icinga-downed) |
[production] |
17:08 |
<krinkle> |
Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 12s) |
[production] |
15:47 |
<paravoid> |
labstore1001: stopping manage-nfs-volumes daemon |
[production] |
04:42 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Sat Jun 13 04:41:57 UTC 2015 (duration 41m 56s) |
[production] |
03:51 |
<Krinkle> |
Running deleteEqualMessages.php for sawiki (T45917) |
[production] |
03:49 |
<Krinkle> |
Running deleteEqualMessages.php for cewiki (T45917) |
[production] |
02:21 |
<LocalisationUpdate> |
completed (1.26wmf9) at 2015-06-13 02:20:58+00:00 |
[production] |
02:18 |
<l10nupdate> |
Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 19s) |
[production] |