4751-4800 of 5900 results (13ms)
2009-02-23 §
21:16 <domas> adler has disk with media errors (ID:5, 6th disk in array): http://p.defau.lt/?3_7_6aIatj3DeNBw_jjtBg - needs cannibalized samuel, disk replacement, and ubuntu install on raid10 [production]
19:04 <Rob> srv136 back from repairs, reinstalling as apache server [production]
18:44 <Rob> srv217 not running apache, synced and restarted [production]
18:29 <Rob> srv33 reinstalled to ubuntu and deployed as apache server [production]
18:24 <Rob> srv32 reinstalled to ubuntu and deployed as apache server [production]
17:55 <Rob> reinstalling srv32 to ubuntu [production]
17:38 <Rob> resynced and restarted apache on srv32, srv33, srv34 [production]
17:32 <Rob> srv31 powered back up [production]
17:25 <Rob> found a breaker flip in the DC, affects srv31-srv34 [production]
13:40 <domas> oh, btw folks, kudos on perfect web2.0 engineering, now morebots complains when message is longer than 140 bytes, and we end up without our microblogging syndication [production]
13:39 <domas> added "su -m 'www-data' -c 'find /opt/mwlib/var/cache/ -mindepth 3 -mtime +1 -delete'" to pdf1 crontab, does anyone actually look after this service? [production]
12:57 <Tim> deployed r47704, now command line scripts don't access /home anymore [production]
11:37 <Tim> switched archive directory over to /mnt/upload5, starting another rsync. Some files will be missing until the rsync is done [production]
10:07 <Tim> moved all job runners from the previous ad hoc script to the new wikimedia-job-runner package [production]
06:25 <Tim> moved the nagios plugins for fedora from /home/nagios to /h/w/common/nagios-fedora-plugins [production]
05:21 <Tim> started udp2log on db20, MW UDP logs were dead [production]
05:19 <Tim> killed errant jobs loop scripts still running on fedora servers [production]
04:36 <Tim> fixed the log directory for /etc/cron.d/mw-central-notice, killed the process that was in a tight loop trying to write to a stale NFS file handle [production]
04:28 <Tim> finished moving ExtensionDistributor working copy [production]
04:14 <Tim> moving ExtensionDistributor working directory from /home to /mnt/upload5 [production]
04:00 <Tim> private/archive/wikipedia was in fact not migrated, but an initial rsync was done. I will do a second rsync now. [production]
03:42 <Tim> rsync done, uploads re-enabled, b/c symlinks set up [production]
03:37 <Tim> doing rsync [production]
03:31 <Tim> temporarily disabled file uploads on all private wikis, for migration to ms1 [production]
02:50 <Tim> same for commons ForeignDBViaLBRepo directory, ScanSet directory, CentralNotice directory, [production]
02:44 <Tim> fixed CommonSettings.php location of deleted images, upload3 -> upload5, appears to have been moved already [production]
2009-02-21 §
19:49 <mark> Installed gmond on eiximenis [production]
19:02 <domas> db26 lacks 8g of ram :) [production]
19:00 <mark> Restarted stuck apache on srv217 [production]
17:26 <mark> Started apache on srv218-221 [production]
17:24 <mark> Restarted stuck apache on srv217 [production]
17:07 <mark> Squid/kernel upgrade complete [production]
16:46 <mark> Increased max-connections per upload squid to ms1 to 100 [production]
15:58 <mark> Running automated upgrade/reboot of squid and kernel on sq43-47 [production]
15:58 <mark> Upgraded squid and kernel on sq41-42, sq48-50, and rebooted [production]
15:44 <mark> Upgraded squid and kernel on sq36-40, and rebooted [production]
12:55 <river> fixed reverse dns entries for ms3/ms4, which had got swapped somehow [production]
11:55 <Tim> re-enabled ExtensionDistributor [production]
11:16 <Tim> removed syslog.0 and messages.0 on srv170 and srv176, they had critical disk free on / [production]
03:25 <Tim> started apache on the image scaling servers [production]
02:51 <brion> ran sync-common on srv199 while i'm at it [production]
02:48 <brion> zeroing out stupid giant syslog files on srv199 [production]
02:46 <brion> srv199 is out of disk space [production]
02:46 <brion> copying hacked-up copies of InitialiseSettings/CommonSettings back to /home so the changes aren't lost this time [production]
02:23 <mark> db20 back up, for reals [production]
02:19 <mark> Rebooting db20 with upgraded RAID controller firmware [production]
02:13 <domas> flashing BIOS helped [production]
02:13 <mark> db20 up! [production]
02:04 <brion> services on bart (secure, planet) are temporarily offline while server is poked at [production]
01:50 <brion> seeing pages, yay [production]