2010-11-03
§
|
21:22 |
<RobH> |
updated puppet to properly remove memcached from memcached::false entries and removed the host memcached check for servers no longer running memcached, hup'd nagios to take the change |
[production] |
21:21 |
<atglenn> |
rebooting ms5 after OS update. note that we were unable to get some of the more recent patches, they are probably from after the sun->oracle transition |
[production] |
21:02 |
<nimishg> |
synchronized php-1.5/extensions/LandingCheck/LandingCheck.i18n.php 'r75890' |
[production] |
21:02 |
<nimishg> |
synchronized php-1.5/extensions/LandingCheck/LandingCheck.alias.php 'r75890' |
[production] |
21:01 |
<nimishg> |
synchronized php-1.5/extensions/LandingCheck/SpecialLandingCheck.php 'r75890' |
[production] |
21:01 |
<nimishg> |
synchronized php-1.5/extensions/LandingCheck/LandingCheck.php 'r75890' |
[production] |
20:31 |
<atglenn> |
removed about 1.5T of stuff off of /export on ms4 (old backups, solaris isos, etc) |
[production] |
19:41 |
<catrope> |
synchronized php-1.5/README 'Dummy sync so I can document what the errors look like' |
[production] |
19:32 |
<tfinc> |
synchronized php-1.5/wmf-config/CommonSettings.php 'Backing out config change for stats fix' |
[production] |
19:31 |
<RobH> |
srv281 still down, setting to false in pybal just so it doesnt keep trying to use it |
[production] |
18:31 |
<RobH> |
reinstalling srv281, tired of lookin at it in red |
[production] |
17:18 |
<mark> |
Upgraded storage1 to Lucid |
[production] |
16:42 |
<mark> |
Removing 2010-03 snapshots on ms4 |
[production] |
16:01 |
<mark> |
Fixed sshd on ms4 |
[production] |
15:46 |
<mark> |
Removing 2010-02 snapshots on ms4 |
[production] |
15:45 |
<mark> |
Disabled gmetric cron jobs on ms4 |
[production] |
15:43 |
<mark> |
Disabled daily snapshot generation on ms4 |
[production] |
15:27 |
<mark> |
Restarted gmond on ms4 |
[production] |
15:24 |
<mark> |
Upgraded puppet on ms4 |
[production] |
15:13 |
<mark> |
Powercycled knsq2 |
[production] |
14:52 |
<mark> |
Removing daily snapshots for 2010-10 on ms4 |
[production] |
14:24 |
<mark> |
Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils |
[production] |
05:34 |
<tstarling> |
synchronized php-1.5/includes/Math.php 'r75909' |
[production] |
04:52 |
<apergos> |
oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified. |
[production] |
04:18 |
<apergos> |
lather rinse repeat for sq47, I hope that's all of 'em |
[production] |
03:46 |
<apergos> |
repeated on sq45... |
[production] |
03:13 |
<apergos> |
same old story on sq46... restarted syslog, reloaded squid, got back some space on / |
[production] |
02:41 |
<apergos> |
er... and deleted the log file :-P |
[production] |
02:38 |
<apergos> |
moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier |
[production] |
02:32 |
<apergos> |
cleaned up / on sq41, restarted syslog, reloaded squid |
[production] |
00:59 |
<nimishg> |
synchronized php-1.5/wmf-config/InitialiseSettings.php |
[production] |
00:53 |
<nimishg> |
synchronizing Wikimedia installation... Revision: 75891 |
[production] |
00:33 |
<apergos1> |
also 44 and 43 |
[production] |
00:30 |
<apergos1> |
cleaning up space on other / full squids: sq42 |
[production] |
2010-11-02
§
|
23:22 |
<apergos> |
same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200 |
[production] |
23:13 |
<apergos> |
trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now... |
[production] |
23:08 |
<apergos> |
hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space |
[production] |
23:08 |
<tfinc> |
synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links' |
[production] |
22:40 |
<apergos> |
added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari |
[production] |
19:48 |
<RobH> |
torrus back online |
[production] |
19:44 |
<RobH> |
following procedure on wikitech to fix torrus |
[production] |
16:46 |
<RobH> |
sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue |
[production] |
16:38 |
<RobH> |
restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs |
[production] |