2010-11-03
§
|
17:18 |
<mark> |
Upgraded storage1 to Lucid |
[production] |
16:42 |
<mark> |
Removing 2010-03 snapshots on ms4 |
[production] |
16:01 |
<mark> |
Fixed sshd on ms4 |
[production] |
15:46 |
<mark> |
Removing 2010-02 snapshots on ms4 |
[production] |
15:45 |
<mark> |
Disabled gmetric cron jobs on ms4 |
[production] |
15:43 |
<mark> |
Disabled daily snapshot generation on ms4 |
[production] |
15:27 |
<mark> |
Restarted gmond on ms4 |
[production] |
15:24 |
<mark> |
Upgraded puppet on ms4 |
[production] |
15:13 |
<mark> |
Powercycled knsq2 |
[production] |
14:52 |
<mark> |
Removing daily snapshots for 2010-10 on ms4 |
[production] |
14:24 |
<mark> |
Restored /etc/sudoers file on DB machines butchered by old versions of wikimedia-raid-utils |
[production] |
05:34 |
<tstarling> |
synchronized php-1.5/includes/Math.php 'r75909' |
[production] |
04:52 |
<apergos> |
oh btw, I notice that when / on the squids fills, we don't see it in ganglia, it must report an aggregate or something. it would sure be nice to get notified. |
[production] |
04:18 |
<apergos> |
lather rinse repeat for sq47, I hope that's all of 'em |
[production] |
03:46 |
<apergos> |
repeated on sq45... |
[production] |
03:13 |
<apergos> |
same old story on sq46... restarted syslog, reloaded squid, got back some space on / |
[production] |
02:41 |
<apergos> |
er... and deleted the log file :-P |
[production] |
02:38 |
<apergos> |
moved ginormous cache.log out of the way on sg48 and reloaded squid over there since it wasn't done earlier |
[production] |
02:32 |
<apergos> |
cleaned up / on sq41, restarted syslog, reloaded squid |
[production] |
00:59 |
<nimishg> |
synchronized php-1.5/wmf-config/InitialiseSettings.php |
[production] |
00:53 |
<nimishg> |
synchronizing Wikimedia installation... Revision: 75891 |
[production] |
00:33 |
<apergos1> |
also 44 and 43 |
[production] |
00:30 |
<apergos1> |
cleaning up space on other / full squids: sq42 |
[production] |
2010-11-02
§
|
23:22 |
<apergos> |
same story on sq50, cleared out some space, tried upping that to 300 but started seeing TCP connection to 208.80.152.156 (208.80.152.156:80) failed in the logs so backed off to 200 |
[production] |
23:13 |
<apergos> |
trying adjusting max-conn on sq49 for conns to ms4... tried 200, it maxed out. trying 300 now... |
[production] |
23:08 |
<apergos> |
hupped squid on sq49, restarted syslog, / was full from "Failed to select source" errors, cleared out some space |
[production] |
23:08 |
<tfinc> |
synchronized php-1.5/wmf-config/CommonSettings.php 'Updating sidebar links' |
[production] |
22:40 |
<apergos> |
added in the amssq47 through amssq62 to /etc/squid/cachemgr.conf on fenari |
[production] |
19:48 |
<RobH> |
torrus back online |
[production] |
19:44 |
<RobH> |
following procedure on wikitech to fix torrus |
[production] |
16:46 |
<RobH> |
sq42 & sq44 behaving normally now, cleaning cache on sq48 and killing squid for restart as it is flapping and at high load, due to earlier nfs issue |
[production] |
16:38 |
<RobH> |
restarting and cleaning backend squid on sq44 and sq42 which were complaining in lvs |
[production] |
16:35 |
<RobH> |
sq43 was flapping since the nfs mount on ms4 was borked. restarted it |
[production] |
16:07 |
<apergos> |
NFSD_SERVERS=2048 in /etc/default on ms4 |
[production] |
16:06 |
<apergos> |
note that the variables rpcmod:cotsmaxdupreqs has been changed to 2048 in /etc/system, and |
[production] |
15:54 |
<apergos> |
hard reset on ms4, reboot was not getting the job done |
[production] |
15:47 |
<apergos> |
rebootint ms4, nfsd hung and couldn't be restarted or killed. |
[production] |
14:04 |
<RobH> |
restarted pdns on linne due to crash from authdns update |
[production] |
14:02 |
<RobH> |
updated dns with new mgmt entries for payments, owasrvs, and owadbs |
[production] |
03:45 |
<domas> |
added srv193 back to apaches pool on lvs |
[production] |