2016-04-21
§
|
14:00 |
<jynus> |
disabled all db lag alerts |
[production] |
13:55 |
<volans> |
[switchover #1/#6] Switch pt-heartbeat from active site (codfw) to new site (eqiad) masters |
[production] |
13:54 |
<ori> |
[switchover #1/2] stopping jobrunners in codfw |
[production] |
13:54 |
<_joe_> |
[switchover #1/3] stopping crons on wasat |
[production] |
13:52 |
<volans> |
[switchover #1/#5] Set final $master status for databases in advance |
[production] |
13:50 |
<volans> |
[switchover #1/#4] Disable puppet on all eqiad and codfw databases masters |
[production] |
13:50 |
<paravoid> |
commencing codfw->eqiad datacenter switchover |
[production] |
13:39 |
<ori@tin> |
Synchronized wmf-config/InitialiseSettings.php: I2171f6b1: Enable MessageCacheError log channel (duration: 00m 25s) |
[production] |
13:37 |
<bblack> |
[traffic codfw switch revert #3] - DNS TTL done, bulk of end-user traffic rebalanced, graphs starting to level off at new normals, as done as it gets from our end |
[production] |
13:31 |
<bblack> |
[traffic codfw switch revert #4] - done & confirmed |
[production] |
13:28 |
<bblack> |
[traffic codfw switch revert #4] - merge -> start salted puppet |
[production] |
13:27 |
<bblack> |
[traffic codfw switch revert #2] - done & confirmed |
[production] |
13:25 |
<bblack> |
[traffic codfw switch revert #3] - merge -> authdns-update |
[production] |
13:24 |
<bblack> |
[traffic codfw switch revert #2] - merge -> start salted puppet |
[production] |
13:23 |
<bblack> |
[traffic codfw switch revert #1] - done & confirmed |
[production] |
13:23 |
<bblack> |
[traffic codfw switch revert #1] - merge -> start salted puppet (@13:20, late log) |
[production] |
13:21 |
<ori@tin> |
Synchronized php-1.27.0-wmf.21/includes: Ie9799f5ea: Make MessageCache handle lock timeouts better (duration: 01m 18s) |
[production] |
13:12 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Temporarely increase es1* master weight to add connection capacity (duration: 00m 37s) |
[production] |
09:57 |
<elukey> |
removed apache2 logrotate config manually from argon as temp patch to remove cronspam from root@ (T132896) |
[production] |
08:36 |
<jynus> |
restarting db1031 to apply new mysql config |
[production] |
02:31 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Thu Apr 21 02:31:04 UTC 2016 (duration 8m 37s) |
[production] |
02:22 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.21) (duration: 09m 48s) |
[production] |
01:49 |
<mutante> |
git pull on strontium, ops/puppet |
[production] |
01:48 |
<mutante> |
belated log: restarted slapd on seaborgium |
[production] |
01:29 |
<ori> |
installed python-progressbar on terbium for warmup script, will be puppetized later |
[production] |
2016-04-20
§
|
22:18 |
<mutante> |
creating ganeti VM install1001 on eqiad cluster |
[production] |
19:03 |
<AaronSchulz> |
Cleared out 'enqueue' job queues to see if corruption comes back |
[production] |
18:17 |
<jynus@tin> |
Synchronized wmf-config/db-eqiad.php: Promote db1031 as the new x1 eqiad local master (duration: 00m 28s) |
[production] |
18:16 |
<ori@tin> |
Synchronized php-1.27.0-wmf.21/extensions/Translate/messagegroups/WikiPageMessageGroup.php: I331bd93b: Avoid more master queries on page views (duration: 00m 31s) |
[production] |
18:16 |
<ori@tin> |
Synchronized php-1.27.0-wmf.21/includes/jobqueue/JobQueueGroup.php: Ie9799f5ea: Catch errors in pushLazyJobs() and log them (duration: 00m 36s) |
[production] |
17:59 |
<jynus> |
changing database topology to set db1031 as the master of x1 on eqiad |
[production] |
17:58 |
<volans> |
Upgrading db1065 and fixing overheathing problems T132515 |
[production] |
17:30 |
<volans> |
Upgrading db1070 and fixing overheathing problems T132515 |
[production] |
17:19 |
<aaron@tin> |
Synchronized php-1.27.0-wmf.21/includes/jobqueue/JobQueueRedis.php: 86d185a4bbf52d (duration: 00m 39s) |
[production] |
17:15 |
<volans> |
Upgrading db1071 and fixing overheathing problems T132515 |
[production] |
17:03 |
<akosiaris> |
aptitude purge php5-xhprof on uranium |
[production] |
16:54 |
<elukey> |
replaced "#" with ";" manually in uranium's /etc/php5/cli/conf.d/20-xhprof.ini and /etc/php5/apache2/php.ini to avoid cronspam (didn't find puppet/package trails) |
[production] |
15:43 |
<ebernhardson> |
delete apifeatureusage-2016.01.20 from codfw elasticsearch cluster. Index should never have existed in this cluster (and is beyond retention). |
[production] |
15:42 |
<ebernhardson> |
delete apifeatureusage-2016-01-(02,09,10) from eqiad elasticsearch cluster. We only keep 30 days of apifeatureusage logs |
[production] |
15:37 |
<jynus@tin> |
Synchronized wmf-config/db-codfw.php: Tweak DB weights for better latency, avoiding peaks on QPS (duration: 00m 32s) |
[production] |
15:18 |
<ottomata> |
enabling puppet on analytics1015 |
[production] |
15:17 |
<andrewbogott> |
re-imaging labtestvirt2001 and labtestneutron2001 |
[production] |
14:56 |
<volans@tin> |
Synchronized wmf-config/db-eqiad.php: Change eqiad masters for s1,s3-s7 - T105135 (duration: 00m 28s) |
[production] |
14:55 |
<ottomata> |
started puppet on analytics1003 |
[production] |
14:52 |
<jynus@tin> |
Synchronized wmf-config/db-codfw.php: Repool es2019 (duration: 00m 38s) |
[production] |
14:37 |
<ottomata> |
stopping puppet on analytics1015 and analytics1003 in prep for migration |
[production] |
13:54 |
<elukey> |
puppet disabled on analytics1027 to stop Camus |
[production] |
13:50 |
<_joe_> |
rolling restart of ocg servers |
[production] |
13:21 |
<moritzm> |
rebooting rdb1002,rdb1003,rdb1004,rdb1006,rdb1007,rdb1008 for upgrade to Linux 4.4 |
[production] |
13:17 |
<jynus> |
[switchover-maintenance] Changing DB slave topology for shard s1 on eqiad T111654 |
[production] |