2009-02-23
§
|
17:55 |
<Rob> |
reinstalling srv32 to ubuntu |
[production] |
17:38 |
<Rob> |
resynced and restarted apache on srv32, srv33, srv34 |
[production] |
17:32 |
<Rob> |
srv31 powered back up |
[production] |
17:25 |
<Rob> |
found a breaker flip in the DC, affects srv31-srv34 |
[production] |
13:40 |
<domas> |
oh, btw folks, kudos on perfect web2.0 engineering, now morebots complains when message is longer than 140 bytes, and we end up without our microblogging syndication |
[production] |
13:39 |
<domas> |
added "su -m 'www-data' -c 'find /opt/mwlib/var/cache/ -mindepth 3 -mtime +1 -delete'" to pdf1 crontab, does anyone actually look after this service? |
[production] |
12:57 |
<Tim> |
deployed r47704, now command line scripts don't access /home anymore |
[production] |
11:37 |
<Tim> |
switched archive directory over to /mnt/upload5, starting another rsync. Some files will be missing until the rsync is done |
[production] |
10:07 |
<Tim> |
moved all job runners from the previous ad hoc script to the new wikimedia-job-runner package |
[production] |
06:25 |
<Tim> |
moved the nagios plugins for fedora from /home/nagios to /h/w/common/nagios-fedora-plugins |
[production] |
05:21 |
<Tim> |
started udp2log on db20, MW UDP logs were dead |
[production] |
05:19 |
<Tim> |
killed errant jobs loop scripts still running on fedora servers |
[production] |
04:36 |
<Tim> |
fixed the log directory for /etc/cron.d/mw-central-notice, killed the process that was in a tight loop trying to write to a stale NFS file handle |
[production] |
04:28 |
<Tim> |
finished moving ExtensionDistributor working copy |
[production] |
04:14 |
<Tim> |
moving ExtensionDistributor working directory from /home to /mnt/upload5 |
[production] |
04:00 |
<Tim> |
private/archive/wikipedia was in fact not migrated, but an initial rsync was done. I will do a second rsync now. |
[production] |
03:42 |
<Tim> |
rsync done, uploads re-enabled, b/c symlinks set up |
[production] |
03:37 |
<Tim> |
doing rsync |
[production] |
03:31 |
<Tim> |
temporarily disabled file uploads on all private wikis, for migration to ms1 |
[production] |
02:50 |
<Tim> |
same for commons ForeignDBViaLBRepo directory, ScanSet directory, CentralNotice directory, |
[production] |
02:44 |
<Tim> |
fixed CommonSettings.php location of deleted images, upload3 -> upload5, appears to have been moved already |
[production] |
2009-02-21
§
|
19:49 |
<mark> |
Installed gmond on eiximenis |
[production] |
19:02 |
<domas> |
db26 lacks 8g of ram :) |
[production] |
19:00 |
<mark> |
Restarted stuck apache on srv217 |
[production] |
17:26 |
<mark> |
Started apache on srv218-221 |
[production] |
17:24 |
<mark> |
Restarted stuck apache on srv217 |
[production] |
17:07 |
<mark> |
Squid/kernel upgrade complete |
[production] |
16:46 |
<mark> |
Increased max-connections per upload squid to ms1 to 100 |
[production] |
15:58 |
<mark> |
Running automated upgrade/reboot of squid and kernel on sq43-47 |
[production] |
15:58 |
<mark> |
Upgraded squid and kernel on sq41-42, sq48-50, and rebooted |
[production] |
15:44 |
<mark> |
Upgraded squid and kernel on sq36-40, and rebooted |
[production] |
12:55 |
<river> |
fixed reverse dns entries for ms3/ms4, which had got swapped somehow |
[production] |
11:55 |
<Tim> |
re-enabled ExtensionDistributor |
[production] |
11:16 |
<Tim> |
removed syslog.0 and messages.0 on srv170 and srv176, they had critical disk free on / |
[production] |
03:25 |
<Tim> |
started apache on the image scaling servers |
[production] |
02:51 |
<brion> |
ran sync-common on srv199 while i'm at it |
[production] |
02:48 |
<brion> |
zeroing out stupid giant syslog files on srv199 |
[production] |
02:46 |
<brion> |
srv199 is out of disk space |
[production] |
02:46 |
<brion> |
copying hacked-up copies of InitialiseSettings/CommonSettings back to /home so the changes aren't lost this time |
[production] |
02:23 |
<mark> |
db20 back up, for reals |
[production] |
02:19 |
<mark> |
Rebooting db20 with upgraded RAID controller firmware |
[production] |
02:13 |
<domas> |
flashing BIOS helped |
[production] |
02:13 |
<mark> |
db20 up! |
[production] |
02:04 |
<brion> |
services on bart (secure, planet) are temporarily offline while server is poked at |
[production] |
01:50 |
<brion> |
seeing pages, yay |
[production] |
01:49 |
<brion> |
running apache2ctl start or apachectl start for various apaches |
[production] |
01:47 |
<domas> |
I FOUND HOW TO REVIVE APACHES |
[production] |
01:46 |
<brion> |
think i killed em, now trying to restart apache procs |
[production] |
01:43 |
<brion> |
poking to see if we can restart apaches... |
[production] |
01:42 |
<brion> |
syncing fixed InitialiseSettings/COmmonSettings to apaches |
[production] |