production SAL

7001-7050 of 10000 results (22ms)

2012-01-06 §
20:57	<binasher>	started slaving db51 off of db31	[production]
20:21	<RobH>	rt2226 - redeploy db22 for asher	[production]
20:19	<RobH>	db22 reinstalled and booting into OS. No puppet runs yet, now its Asher's problem ;]	[production]
20:04	<RobH>	db22 reinstalling	[production]
19:24	<binasher>	started innodb hot backup of db1038 to db51	[production]
18:43	<maplebed>	s4 database rotation complete. outage duration 36 minutes.	[production]
18:37	<maplebed>	pushed out new db.php setting s4 to read-write	[production]
18:37	<ben>	synchronized wmf-config/db.php	[production]
18:35	<maplebed>	db31 made read-write as the new master for s4	[production]
18:31	<maplebed>	old master for s4 log file db22-bin.000106 log pos 631618956	[production]
18:30	<maplebed>	new master for s4: db31, log file db31-bin.000213 log pos is 205612709	[production]
18:24	<asher>	synchronized wmf-config/db.php 'setting s4 to read only, preparing to make db31 master'	[production]
18:22	<Reedy>	Commons having db issues, db22 (s4 master) has a disk issue	[production]
16:02	<apergos>	restarted lilghty on dataset2	[production]
16:01	<Reedy>	HTTP server (lighttpd?) seems to be down on dataset2	[production]
15:46	<RoanKattouw>	Removing gs_* files in /tmp on srv220 that are >30 min old	[production]
15:44	<reedy>	synchronized wmf-config/InitialiseSettings.php 'Bug 33556 - ArticleFeedback settings on Chinese wikipedia'	[production]
15:43	<RoanKattouw>	Removed /tmp/mw-cache-1.17 and /tmp/mw-cache-1.17-test on srv220	[production]
15:41	<Reedy>	srv220 / is at 100% usage	[production]
15:41	<reedy>	synchronized wmf-config/InitialiseSettings.php 'Bug 33556 - ArticleFeedback settings on Chinese wikipedia'	[production]
14:34	<mutante>	saw the log about cp1043/44 being deliberately left broken, but requirement in varnish.pp also broke others, fixed on sq67,68,69 (gerrit change 1802)	[production]
02:01	<LocalisationUpdate>	completed (1.18) at Fri Jan 6 02:05:01 UTC 2012	[production]
01:25	<binasher>	puppet is being deliberately left broken on cp1043 and 1044 until tomorrow	[production]
01:23	<binasher>	backend varnish instance on cp1042 running 3.0.2 is in production for 1/3 of mobile requests	[production]
2012-01-05 §
22:15	<preilly>	small fix for iPhone vary support	[production]
22:15	<preilly>	synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php	[production]
21:39	<Ryan_Lane>	rebooting virt1	[production]
21:01	<reedy>	synchronized wmf-config/CommonSettings.php 'wmgShortUrlPrefix'	[production]
21:01	<reedy>	synchronized wmf-config/InitialiseSettings.php 'wmgShortUrlPrefix'	[production]
20:08	<Reedy>	Created ShortUrl tables on testwiki	[production]
20:07	<reedy>	synchronizing Wikimedia installation... : Update extensionmessages	[production]
20:05	<reedy>	synchronized wmf-config/CommonSettings.php 'wmgUseShortUrl'	[production]
20:04	<reedy>	synchronized wmf-config/InitialiseSettings.php 'wmgUseShortUrl'	[production]
20:02	<reedy>	synchronized php-1.18/extensions/ShortUrl 'Pushing ShortUrl files out'	[production]
19:08	<notpeter>	restarting dhcpd on brewster	[production]
18:45	<preilly>	pushing fix for js error on production	[production]
18:45	<preilly>	synchronized php-1.18/extensions/MobileFrontend/ApplicationTemplate.php	[production]
18:45	<preilly>	synchronized php-1.18/extensions/MobileFrontend/javascripts/application.js	[production]
18:00	<mutante>	tarin - added "#includedir /etc/sudoers.d" to sudo config, needs to read /etc/sudoers.d/nrpe for Nagios RAID check	[production]
17:49	<logmsgbot_>	hashar: gallium: cleaned /tmp . Our test suites leak a large amount of files :D	[production]
17:49	<^demon>	removed chuck norris plugin from jenkins, restarted	[production]
16:48	<mutante>	payments4 - 25 running nginx procs cause a warning - but normal and just raise limit?	[production]
16:15	<mutante>	people claim it was "completely resolved with "2.6.38-10 backport from PPA." (add-apt-repository ppa:kernel-ppa/ppa ...). wanna try that? (or just reboot ms1002 pls)	[production]
15:45	<mutante>	ms1002 - kswapd 100% CPU - but no swap used and free memory left - this looks like https://bugs.launchpad.net/ubuntu/+bug/721896 again	[production]
15:39	<mutante>	Nagios check_ntp does stuff like: overall average offset: 0 -> NTP OK: Offset unknown\| -> NTP CRITICAL: Offset unknown (even though this bug was supposed to be fixed in a version before the one we use)..sigh	[production]
15:14	<mutante>	lvs1004 - puppet didnt run since 12 hours, looked stuck, "already in progress" on every run. rm /var/lib/puppet/state/puppetdlock, restart puppet agent, finished fine in a few seconds. maybe puppet [[bugzilla:2888\|bug 2888]],5246 or related	[production]
14:57	<mutante>	magnesium - memcached runs on default port 11211, but we run all the others on 11000, this causes Nagios CRIT. Is it supposed to run here? (was also on -l 127.0.0.1 only, but init script starts it on all)	[production]
14:55	<Jeff_Green>	searchidx1 /a reached 100%, did the "space issues" maintenance procedure from wikitech search documentation	[production]
14:39	<mutante>	same on srv193	[production]
14:35	<mutante>	srv290 - before restart memcached was running with -m 64 and -l 127.0.0.1 for some reason, causing Nagios CRIT, now it looks like others and recovered	[production]