2010-11-05
§
|
17:43 |
<RobH> |
srv266 unresponsive to remote console, rebooting and updating |
[production] |
17:42 |
<RobH> |
srv206 fixed, pushed back into lvs |
[production] |
17:25 |
<RobH> |
working on srv206, disregard any errors it throws |
[production] |
16:40 |
<RobH> |
issue with the new api servers is fixed and they are now back in service |
[production] |
16:04 |
<RobH> |
some new api servers are not working right, depooled until they are fixed |
[production] |
15:58 |
<mark> |
Removed ibis IPs from Squid ACLs; invalid requests issue has been resolved |
[production] |
15:57 |
<mark> |
Fixed NFS mounts on apaches that had them missing since the wikimedia-task-appserver upgrade |
[production] |
15:26 |
<RobH> |
working on sq57, disregard flapping |
[production] |
15:24 |
<RobH> |
new api apackes srv290-srv301 are online, except srv298 which needs drac correction before installation |
[production] |
15:22 |
<RobH> |
dropping old entry for tenwiki in apache config and resyncing/restarting apaches to eliminate error message |
[production] |
15:18 |
<RobH> |
pushing srv291-srv301 into lvs |
[production] |
15:11 |
<RobH> |
doing puppet runs on srv292-srv301 before pushing them into service |
[production] |
14:57 |
<mark> |
Hacked out the 'remotemount' lines in /var/lib/dpkg/info/wikimedia-task-appserver.postrm files to prevent apaches from being without NFS mounts during/between puppet runs and package upgrades |
[production] |
14:23 |
<mark> |
Deploying new package wikimedia-task-appserver 1.46 across the cluster, which removes configuration files (now handled by Puppet) |
[production] |
11:59 |
<catrope> |
synchronized php-1.5/includes/api/ApiLogin.php 'Revert r76078' |
[production] |
11:49 |
<catrope> |
synchronized php-1.5/includes/api/ApiLogin.php 'r76078' |
[production] |
05:57 |
<apergos> |
failure booting into be3 on ms4, had to back out. so, no progress, we are back to where we were before the reboots. |
[production] |
05:40 |
<apergos> |
cleared up luactivate error, shutdown ms4 again, trying to boot into alt boot environment |
[production] |
05:16 |
<apergos> |
used shutdown on ms4, be3 showed as "active on reboot" but it booted into be0 (old boot environment) nonetheless. *grumble* |
[production] |
05:06 |
<apergos> |
rebooted ms4 into alt boot environment with current patches applied |
[production] |
00:18 |
<RobH> |
new api servers are not coping down the data correctly and not reflecting config changes in puppet, so they fail, srv290+ not online yet |
[production] |
2010-11-04
§
|
23:06 |
<RobH> |
running puppet across the new api servers srv290-srv301 then will push them in service later when i figure out why they are not doing what I want ;P |
[production] |
20:13 |
<RobH> |
sq51 hatees me |
[production] |
20:11 |
<RobH> |
new api servers srv290-301 are installed and showing in ganglia, having issues getting the first couple to pool into lvs before i push the rest into service |
[production] |
20:09 |
<RobH> |
fixed sq51 |
[production] |
19:29 |
<RoanKattouw> |
Strike that, have backed out changes |
[production] |
19:06 |
<RoanKattouw> |
Until Mark's made sure they're good, that is |
[production] |
19:06 |
<RoanKattouw> |
Changing some files in wmf-deployment/includes/media . DO NOT RUN SCAP or otherwise deploy these changes! |
[production] |
18:36 |
<RobH> |
added dns entries for payments |
[production] |
17:59 |
<RobH> |
doing puppet runs and final setup for srv290-srv301 |
[production] |
16:56 |
<rfaulk> |
Added numpy Python package to grosley.wikimedia.org with apt_get ... For use in the 2010/11 fundraiser to facilitate stats gathering by providing scientific computing functionality in Python |
[production] |
16:43 |
<rfaulk> |
Added MySQLdb Python package to on grosley.wikimedia.org with apt-get ... This package will be used to access fundraising databases to facilitate the gathering and synthesis of relevant statistics for the 2010/11 Wikimedia findraiser |
[production] |
16:23 |
<mark> |
Set storage1 (varnish) as upload backend on sq41-50, instead of ms4 |
[production] |
16:14 |
<RobH> |
sq59 is being bitchy and wont clean the cache, possible hdd issue? will investigate later |
[production] |
15:42 |
<RobH> |
sq35 back in rotation |
[production] |
15:34 |
<mark> |
Added storage1 (varnish->ms4) as an HTTP backend to sq45's squid config |
[production] |
15:34 |
<RobH> |
commenting out sq35, trying to make it work again in pybal |
[production] |
15:16 |
<RobH> |
poking at sq59 |
[production] |
15:06 |
<RobH> |
sq35 back online, pushed into lvs, partially up - may need to wait up to 5 for idleconnect timer |
[production] |
14:46 |
<RobH> |
pushed dns updates for new payments boxes and correcting owadb1/2 to db31/32 |
[production] |
14:28 |
<RobH> |
sq35 set to false in pybal until i determine whats wrong with it |
[production] |
14:09 |
<mark> |
Reduced CARP weight of sq41-50 from 10 to 5 |
[production] |
13:37 |
<RobH> |
sq35 may flag, disregard |
[production] |
13:30 |
<RoanKattouw> |
Removed uploadwizard test wiki on prototype, gonna set it up on the Commons prototype instead |
[production] |
04:17 |
<atglenn> |
ganglia 3.1 now running on ms4 and ms5 |
[production] |
01:44 |
<RobH> |
srv217 back in cluster |
[production] |
00:36 |
<RobH> |
torrus back online |
[production] |
00:29 |
<RobH> |
fixing torrus deadlock, no touchy |
[production] |
00:18 |
<tomaszf> |
upped open fd's on loudon to 4096 |
[production] |
00:17 |
<RobH> |
kicking srv217 for reinstall |
[production] |