production SAL

2301-2350 of 3450 results (15ms)

2009-02-23 §
21:16	<domas>	adler has disk with media errors (ID:5, 6th disk in array): http://p.defau.lt/?3_7_6aIatj3DeNBw_jjtBg - needs cannibalized samuel, disk replacement, and ubuntu install on raid10	[production]
19:04	<Rob>	srv136 back from repairs, reinstalling as apache server	[production]
18:44	<Rob>	srv217 not running apache, synced and restarted	[production]
18:29	<Rob>	srv33 reinstalled to ubuntu and deployed as apache server	[production]
18:24	<Rob>	srv32 reinstalled to ubuntu and deployed as apache server	[production]
17:55	<Rob>	reinstalling srv32 to ubuntu	[production]
17:38	<Rob>	resynced and restarted apache on srv32, srv33, srv34	[production]
17:32	<Rob>	srv31 powered back up	[production]
17:25	<Rob>	found a breaker flip in the DC, affects srv31-srv34	[production]
13:40	<domas>	oh, btw folks, kudos on perfect web2.0 engineering, now morebots complains when message is longer than 140 bytes, and we end up without our microblogging syndication	[production]
13:39	<domas>	added "su -m 'www-data' -c 'find /opt/mwlib/var/cache/ -mindepth 3 -mtime +1 -delete'" to pdf1 crontab, does anyone actually look after this service?	[production]
12:57	<Tim>	deployed r47704, now command line scripts don't access /home anymore	[production]
11:37	<Tim>	switched archive directory over to /mnt/upload5, starting another rsync. Some files will be missing until the rsync is done	[production]
10:07	<Tim>	moved all job runners from the previous ad hoc script to the new wikimedia-job-runner package	[production]
06:25	<Tim>	moved the nagios plugins for fedora from /home/nagios to /h/w/common/nagios-fedora-plugins	[production]
05:21	<Tim>	started udp2log on db20, MW UDP logs were dead	[production]
05:19	<Tim>	killed errant jobs loop scripts still running on fedora servers	[production]
04:36	<Tim>	fixed the log directory for /etc/cron.d/mw-central-notice, killed the process that was in a tight loop trying to write to a stale NFS file handle	[production]
04:28	<Tim>	finished moving ExtensionDistributor working copy	[production]
04:14	<Tim>	moving ExtensionDistributor working directory from /home to /mnt/upload5	[production]
04:00	<Tim>	private/archive/wikipedia was in fact not migrated, but an initial rsync was done. I will do a second rsync now.	[production]
03:42	<Tim>	rsync done, uploads re-enabled, b/c symlinks set up	[production]
03:37	<Tim>	doing rsync	[production]
03:31	<Tim>	temporarily disabled file uploads on all private wikis, for migration to ms1	[production]
02:50	<Tim>	same for commons ForeignDBViaLBRepo directory, ScanSet directory, CentralNotice directory,	[production]
02:44	<Tim>	fixed CommonSettings.php location of deleted images, upload3 -> upload5, appears to have been moved already	[production]
2009-02-21 §
19:49	<mark>	Installed gmond on eiximenis	[production]
19:02	<domas>	db26 lacks 8g of ram :)	[production]
19:00	<mark>	Restarted stuck apache on srv217	[production]
17:26	<mark>	Started apache on srv218-221	[production]
17:24	<mark>	Restarted stuck apache on srv217	[production]
17:07	<mark>	Squid/kernel upgrade complete	[production]
16:46	<mark>	Increased max-connections per upload squid to ms1 to 100	[production]
15:58	<mark>	Running automated upgrade/reboot of squid and kernel on sq43-47	[production]
15:58	<mark>	Upgraded squid and kernel on sq41-42, sq48-50, and rebooted	[production]
15:44	<mark>	Upgraded squid and kernel on sq36-40, and rebooted	[production]
12:55	<river>	fixed reverse dns entries for ms3/ms4, which had got swapped somehow	[production]
11:55	<Tim>	re-enabled ExtensionDistributor	[production]
11:16	<Tim>	removed syslog.0 and messages.0 on srv170 and srv176, they had critical disk free on /	[production]
03:25	<Tim>	started apache on the image scaling servers	[production]
02:51	<brion>	ran sync-common on srv199 while i'm at it	[production]
02:48	<brion>	zeroing out stupid giant syslog files on srv199	[production]
02:46	<brion>	srv199 is out of disk space	[production]
02:46	<brion>	copying hacked-up copies of InitialiseSettings/CommonSettings back to /home so the changes aren't lost this time	[production]
02:23	<mark>	db20 back up, for reals	[production]
02:19	<mark>	Rebooting db20 with upgraded RAID controller firmware	[production]
02:13	<domas>	flashing BIOS helped	[production]
02:13	<mark>	db20 up!	[production]
02:04	<brion>	services on bart (secure, planet) are temporarily offline while server is poked at	[production]
01:50	<brion>	seeing pages, yay	[production]