2009-02-23
§
|
22:56 |
<RobH_> |
srv136 reinstalled and redeployed as apache |
[production] |
22:33 |
<mark> |
Installed puppet (test install) and thereby automatically gmond as aggregator on srv33 |
[production] |
21:16 |
<domas> |
adler has disk with media errors (ID:5, 6th disk in array): http://p.defau.lt/?3_7_6aIatj3DeNBw_jjtBg - needs cannibalized samuel, disk replacement, and ubuntu install on raid10 |
[production] |
19:04 |
<Rob> |
srv136 back from repairs, reinstalling as apache server |
[production] |
18:44 |
<Rob> |
srv217 not running apache, synced and restarted |
[production] |
18:29 |
<Rob> |
srv33 reinstalled to ubuntu and deployed as apache server |
[production] |
18:24 |
<Rob> |
srv32 reinstalled to ubuntu and deployed as apache server |
[production] |
17:55 |
<Rob> |
reinstalling srv32 to ubuntu |
[production] |
17:38 |
<Rob> |
resynced and restarted apache on srv32, srv33, srv34 |
[production] |
17:32 |
<Rob> |
srv31 powered back up |
[production] |
17:25 |
<Rob> |
found a breaker flip in the DC, affects srv31-srv34 |
[production] |
13:40 |
<domas> |
oh, btw folks, kudos on perfect web2.0 engineering, now morebots complains when message is longer than 140 bytes, and we end up without our microblogging syndication |
[production] |
13:39 |
<domas> |
added "su -m 'www-data' -c 'find /opt/mwlib/var/cache/ -mindepth 3 -mtime +1 -delete'" to pdf1 crontab, does anyone actually look after this service? |
[production] |
12:57 |
<Tim> |
deployed r47704, now command line scripts don't access /home anymore |
[production] |
11:37 |
<Tim> |
switched archive directory over to /mnt/upload5, starting another rsync. Some files will be missing until the rsync is done |
[production] |
10:07 |
<Tim> |
moved all job runners from the previous ad hoc script to the new wikimedia-job-runner package |
[production] |
06:25 |
<Tim> |
moved the nagios plugins for fedora from /home/nagios to /h/w/common/nagios-fedora-plugins |
[production] |
05:21 |
<Tim> |
started udp2log on db20, MW UDP logs were dead |
[production] |
05:19 |
<Tim> |
killed errant jobs loop scripts still running on fedora servers |
[production] |
04:36 |
<Tim> |
fixed the log directory for /etc/cron.d/mw-central-notice, killed the process that was in a tight loop trying to write to a stale NFS file handle |
[production] |
04:28 |
<Tim> |
finished moving ExtensionDistributor working copy |
[production] |
04:14 |
<Tim> |
moving ExtensionDistributor working directory from /home to /mnt/upload5 |
[production] |
04:00 |
<Tim> |
private/archive/wikipedia was in fact not migrated, but an initial rsync was done. I will do a second rsync now. |
[production] |
03:42 |
<Tim> |
rsync done, uploads re-enabled, b/c symlinks set up |
[production] |
03:37 |
<Tim> |
doing rsync |
[production] |
03:31 |
<Tim> |
temporarily disabled file uploads on all private wikis, for migration to ms1 |
[production] |
02:50 |
<Tim> |
same for commons ForeignDBViaLBRepo directory, ScanSet directory, CentralNotice directory, |
[production] |
02:44 |
<Tim> |
fixed CommonSettings.php location of deleted images, upload3 -> upload5, appears to have been moved already |
[production] |
2009-02-21
§
|
19:49 |
<mark> |
Installed gmond on eiximenis |
[production] |
19:02 |
<domas> |
db26 lacks 8g of ram :) |
[production] |
19:00 |
<mark> |
Restarted stuck apache on srv217 |
[production] |
17:26 |
<mark> |
Started apache on srv218-221 |
[production] |
17:24 |
<mark> |
Restarted stuck apache on srv217 |
[production] |
17:07 |
<mark> |
Squid/kernel upgrade complete |
[production] |
16:46 |
<mark> |
Increased max-connections per upload squid to ms1 to 100 |
[production] |
15:58 |
<mark> |
Running automated upgrade/reboot of squid and kernel on sq43-47 |
[production] |
15:58 |
<mark> |
Upgraded squid and kernel on sq41-42, sq48-50, and rebooted |
[production] |
15:44 |
<mark> |
Upgraded squid and kernel on sq36-40, and rebooted |
[production] |
12:55 |
<river> |
fixed reverse dns entries for ms3/ms4, which had got swapped somehow |
[production] |
11:55 |
<Tim> |
re-enabled ExtensionDistributor |
[production] |
11:16 |
<Tim> |
removed syslog.0 and messages.0 on srv170 and srv176, they had critical disk free on / |
[production] |
03:25 |
<Tim> |
started apache on the image scaling servers |
[production] |
02:51 |
<brion> |
ran sync-common on srv199 while i'm at it |
[production] |
02:48 |
<brion> |
zeroing out stupid giant syslog files on srv199 |
[production] |
02:46 |
<brion> |
srv199 is out of disk space |
[production] |