2008-12-04
§
|
22:50 |
<domas> |
job runners are no longer blue on ganglia CPU graphs :((((((( |
[production] |
22:45 |
<domas> |
fc4 maintenance, reniced job runners to 20 (10 behind apaches), installed apc3.0.19 (APC3.0.13 seams to have hit severe lock contention/busylooping at overloads) |
[production] |
22:04 |
<RobH> |
re-enabled sq38 in pybal. all is well |
[production] |
22:02 |
<RobH> |
fired sq37-sq39 back up |
[production] |
21:58 |
<RobH> |
shutdown sq37-sq39, cuz I need to balance the power distribution a bit better. |
[production] |
21:40 |
<RobH> |
sq38 is trying to break my spirit, so i reinstalled it to show it who is boss (me!) |
[production] |
21:02 |
<RobH> |
setup asw-a4-sdtpa and asw-a5-sdtpa on scs-a1-sdtpa |
[production] |
20:52 |
<mark> |
Increased TCP buffers on srv88 (a Fedora), matching the Ubuntus - Fedora Apaches appear to get stuck/deadlocked on writes to Squids |
[production] |
19:39 |
<RobH> |
pulled sq38 back out, as it is giving me issues. need to fix the msw-a3-sdtpa before i can fix sq38. |
[production] |
19:35 |
<RobH> |
added sq38, sq39 back into pybal |
[production] |
19:25 |
<RobH> |
added sq36, sq37 back into pybal |
[production] |
18:14 |
<RobH> |
I need to stop forgetting about lunch and stop working through it, oh well. |
[production] |
18:13 |
<RobH> |
depooled sq36-sq39 for move from pmtpa to sdtpa. |
[production] |
18:12 |
<RobH> |
some tinkering with lvs4 and idleconnection timer was fixed by mark. |
[production] |
17:46 |
<RobH> |
racked sq21-sq35 in sdtpa-a3. added back to pybal. |
[production] |
16:31 |
<RobH> |
depooled sq31-sq35 from lvs4 to move from pmtpa to sdtpa |
[production] |
15:15 |
<RobH> |
reinstalled storage1 to ubuntu 8.04, left data partition intact and untouched. |
[production] |
2008-12-03
§
|
23:46 |
<JeLuF> |
performing importImage.php imports to commons for Duesentrieb |
[production] |
19:13 |
<RobH> |
tested i/o on db17, issue where it pauses disk access is gone. |
[production] |
19:02 |
<mark> |
Shutdown TeliaSonera (AS1299) BGP session, the link is flaky resuling in unidirectional traffic only for most of the day |
[production] |
19:02 |
<RobH> |
replaced hardware in db17, reinstalled. |
[production] |
18:58 |
<mark> |
Prepared search10, search11 and search12 as search servers |
[production] |
17:26 |
<brion> |
investigating ploticus config breakage [[bugzilla:16085]] |
[production] |
17:18 |
<brion> |
ploticus seems to be missing from most new apaches |
[production] |
17:12 |
<RobH_DC> |
search10, search11, search12 racked and installed. |
[production] |
14:29 |
<RobH_DC> |
srv136 was unresponsive, rebooted, synced, back in rotation. |
[production] |
2008-12-02
§
|
23:33 |
<brion> |
scapping to update ContributionReporting ext |
[production] |
23:11 |
<Tim> |
db7 wasn't deleting its relay logs for some reason, since August 21. Disk critical. Did a reset slave. |
[production] |
20:03 |
<brion> |
rebuilt public_reporting with fixed encoding |
[production] |
19:54 |
<brion> |
fudged charsets in triggers for donation db update, let's see if that helps |
[production] |
12:11 |
<Tim> |
started squid (backend instance) on sq40, stopped for 13 days for no apparent reason |
[production] |
12:08 |
<Tim> |
restarted apache on srv161, srv122, srv137, attempted on srv123 but it is waiting for dead NFS mount |
[production] |
11:44 |
<Tim> |
took srv183 out of memcached rotation |
[production] |
10:50 |
<Tim> |
purged binlogs on ixia and db1 (both critical) |
[production] |
2008-12-01
§
|
23:49 |
<brion> |
sync-common-all'ing to add a wikispecies little icon for sul shared session login, since people keep asking for it :) |
[production] |
20:31 |
<RobH> |
synced and restarted apache on srv89 |
[production] |
19:33 |
<RobH> |
manually setup apache-check for pybal on srv138, synced, enabled. |
[production] |
19:29 |
<RobH> |
manually setup the apache_check stuff for srv126 and pybal. |
[production] |
17:19 |
<RobH> |
synced and restarted apache on srv176 & srv176 |
[production] |
17:18 |
<RobH> |
did the sync and restart thing for apache on srv162 |
[production] |
17:16 |
<RobH> |
synced and restarted apache on srv145 |
[production] |
17:13 |
<RobH> |
synced and restarted apache on srv121 and srv125 |
[production] |
17:00 |
<RobH> |
apache wasnt working on srv102 and srv106, restarted them after syncing |
[production] |
15:10 |
<mark> |
Restarted stuck pdns_server on bayle, lots of stale selective_answer.py processes |
[production] |
14:44 |
<domas> |
restored Roma article on itwiki, had orphaned revision entries after deleting it, manually inserted page entry |
[production] |