production SAL

5751-5800 of 10000 results (24ms)

2009-12-06 §
01:20	<mark>	Disabled xinetd and extdist crontab on zwinger	[production]
00:40	<mark>	synchronized php-1.5/wmf-config/CommonSettings.php 'Moved svn-invoker (ExtensionDistributor) from zwinger to fenari'	[production]
00:27	<mark>	sq27 is flooding syslog; placed temporary firewall entry for syslog packets on nfs1	[production]
2009-12-05 §
03:26	<tfinc>	synchronized php-1.5/extensions/ContributionReporting/ContributionStatistics_body.php 'picking up bugfix from r59753'	[production]
00:46	<tfinc>	synchronized php-1.5/wmf-config/CommonSettings.php 'adding CN Notice 22'	[production]
00:44	<atglenn>	start transfer of incremental via zfs send (600gb?) from ms1 to file on ms4, in prep for nc to ms7 later, running in screen as root on ms1	[production]
00:14	<Fred>	synchronized php-1.5/wmf-config/InitialiseSettings.php 'changed logo for usabilitywiki.'	[production]
00:11	<Fred>	synchronized php-1.5/wmf-config/InitialiseSettings.php 'changed logo for usabilitywiki.'	[production]
2009-12-04 §
23:30	<atglenn>	started netcat of the bulk of the data from ms5 to ms7. running in screen as root on both hosts.	[production]
23:21	<atglenn>	started ncat of (small piece of) image date from ms5 to ms7, running in screen as root on both hosts	[production]
20:47	<Rob>	which doesnt work, damn.	[production]
20:47	<Rob>	got sick of racktables.wikimedia.org not redirecting correctly, put in a rewrite for non ssl connections to ssl	[production]
20:24	<Fred>	fixed nrpe on db20 and db7	[production]
20:13	<root>	ran sync-common-all	[production]
20:12	<Rob>	running sync-common-all to update configuration for support of flaggedrevs on plwiktionary	[production]
19:20	<Rob>	srv144 removed from node groups & pybal, nagios resynced.	[production]
19:19	<Rob>	srv144 is out of warranty and rebooting randomly, decommissioning.	[production]
19:05	<Fred>	finished setup of srv245.	[production]
19:02	<Rob>	srv126 removed from node groups and lvs. nagios restarted to exclude it.	[production]
19:01	<Rob>	srv126 refuses to even post when benched, out of warranty, slating for immediate decommissioning	[production]
19:00	<Rob>	srv144 reinstalling with a single hard disk, no more raid1	[production]
18:50	<Rob>	swapped primary srv144 drive with old decommissioned spare. reinstalling OS, will reinstall packages and get online later.	[production]
18:45	<Rob>	sq22 back online, all drives nominal, rebuilding cache and ensuring it is in rotation	[production]
18:41	<Rob>	rebooted sq22	[production]
18:38	<Rob>	rebooted srv144 and srv126	[production]
18:36	<Rob>	srv245 package install failed. I do not have time to tinker with it while in the DC, I have other things that require my physical access to the machines. Leaving it alone for now to work on remotely.	[production]
18:28	<Rob>	srv245 OS installed, setting up wikimedia-task-appserver	[production]
18:06	<Rob>	srv245 was sitting idle with no OS, depooled from apaches. reinstalling system.	[production]
17:57	<Rob>	rebooted srv83 per fred	[production]
17:35	<Fred>	removed srv83 from the nodelist since it was causing ddsh to never finish executing.	[production]
17:26	<Fred>	fixed broken apache. Seems like there is a machine down that is preventing normal sync-file from finishing... Looking into it.	[production]
16:50	<rainman-sr>	stopped logging of search queries on searchidx1 until someone sets up proper log archiving to a different machine	[production]
16:48	<rainman-sr>	searchidx1 had full disk, freed some 100gb of space by deleting logs and stuff laying around	[production]
16:14	<Rob>	srv245 down and unresponsive, rebooting	[production]
16:12	<Rob>	sq43's replacement disk is also bad (talk about bad luck), placing rma with dell. system will remain powered down for now.	[production]
15:55	<Rob>	sq43 isn't seeing a replaced disk, rebooting and troubleshooting	[production]
15:33	<domas>	'arcconf setcache 1 logicaldrive 0 roff ' - disabling any read caching on db11-db30 RAIDs	[production]
15:13	<Rob>	after tinkering with it with domas, it appears rebuild is indeed automatic. db21 rebuilding raid array	[production]
15:09	<Rob>	db21 bad disk swapped out, rebuild should be automatic	[production]
14:57	<Rob>	sq14 back up, rebuilding its cache	[production]
14:54	<Rob>	sq13 primary disk dead, out of warranty	[production]
14:53	<Rob>	swapping sdc in sq13 and sq14 to bring sq14 back online	[production]
14:53	<Rob>	sq14 disk sdc dead, out of warranty.	[production]
05:18	<Tim>	on fenari: running all pending renameUser jobs from enwiki	[production]
03:37	<Tim>	Around 03:12, accidentally renamed enwiki's job table and so renamed it back a second later. This caused all slaves to stop due to a replication bug. Fixed now.	[production]
03:25	<Tim>	testing fixJobQueueExplosion.php on commonswiki	[production]
02:46	<Tim>	srv156 not responding to ssh, trying reboot	[production]
01:13	<Tim>	restarting job runners	[production]
01:13	<tstarling>	synchronized php-1.5/includes/HTMLCacheUpdate.php 'patching out all category backlink updates, major bug causing job queue to stall'	[production]
00:12	<Tim>	granted access to root@fenari on all servers in the mysql node group	[production]