production SAL

3051-3100 of 10000 results (17ms)

2013-12-02 §
02:08	<LocalisationUpdate>	completed (1.23wmf4) at Mon Dec 2 02:08:15 UTC 2013	[production]
01:16	<springle>	restarting labsdb1002 mysqld processes with 25% smaller buffer pools. kernel OOM killer striking. needs investigation	[production]
2013-12-01 §
06:23	<apergos>	sq37 hardware errors, probably controller, rt #6418	[production]
06:04	<apergos>	pwercycled sq37, was giving 'i/o error' to all commands	[production]
02:36	<LocalisationUpdate>	ResourceLoader cache refresh completed at Sun Dec 1 02:36:06 UTC 2013	[production]
02:13	<LocalisationUpdate>	completed (1.23wmf5) at Sun Dec 1 02:13:36 UTC 2013	[production]
02:08	<LocalisationUpdate>	completed (1.23wmf4) at Sun Dec 1 02:07:54 UTC 2013	[production]
2013-11-30 §
05:54	<paravoid>	rebooting cp1052 with the same kernel (control)	[production]
02:47	<LocalisationUpdate>	ResourceLoader cache refresh completed at Sat Nov 30 02:47:22 UTC 2013	[production]
02:18	<LocalisationUpdate>	completed (1.23wmf5) at Sat Nov 30 02:18:36 UTC 2013	[production]
02:10	<LocalisationUpdate>	completed (1.23wmf4) at Sat Nov 30 02:10:13 UTC 2013	[production]
2013-11-29 §
21:09	<ori>	synchronized wmf-config/InitialiseSettings.php 'Id9c7321b8: Add a MassMessage-related user group on Meta'	[production]
21:08	<ori>	updated /a/common to {{Gerrit\|Id9c7321b8}}: Add a MassMessage-related user group on Meta	[production]
20:18	<paravoid>	rebooting cp1065 with new kernel	[production]
19:26	<paravoid>	"swapoff -a" on all cache_text to deal with strange kernel issue with kswapd dropping the whole page cache on memory pressure	[production]
14:49	<paravoid>	restarted gmond on ms-fe1001/2, both were stuck 6h ago and we lost all swift eqiad's metrics for that period	[production]
11:12	<Reedy>	Created EducationProgram tables on arwiki	[production]
05:51	<Tim>	on cp1052 and cp1053: tweaked /proc/sys/net/core/rmem_default to see if that fixes the observed massive gmond packet loss	[production]
02:08	<LocalisationUpdate>	ResourceLoader cache refresh completed at Fri Nov 29 02:07:55 UTC 2013	[production]
02:02	<LocalisationUpdate>	completed (1.23wmf5) at Fri Nov 29 02:02:25 UTC 2013	[production]
02:01	<LocalisationUpdate>	completed (1.23wmf4) at Fri Nov 29 02:01:44 UTC 2013	[production]
01:22	<springle>	synchronized wmf-config/db-eqiad.php 'repool pc1001 after upgrade, max_connections lowered during warm up'	[production]
00:25	<springle>	synchronized wmf-config/db-eqiad.php 'depool pc1001 for package upgrade'	[production]
2013-11-28 §
10:49	<apergos>	turned off logging for parsoid ( https://gerrit.wikimedia.org/r/#/c/98082/ ), old logs remain in place for folks to examine	[production]
10:06	<apergos>	stack traces filling up parsoid nohup.out logs (sveral gigs in only a few minutes once the parsoid gets into that state), sample on wtp1010 in /var/lib/parsoid/nohup.out.errors	[production]
08:34	<apergos>	and wtp1023	[production]
08:29	<apergos>	/var/lib/parsoid/nohup.out on wtp 1005,11,12 was 6gb or more, causing / on these boxes to fill; moved it, restarted parsoid, removed it	[production]
07:16	<apergos>	powercycled sq80	[production]
05:41	<ori>	synchronized wmf-config/CommonSettings.php 'Icdaa4c1b5: Configure parser cache databases in db-$realm file (3/3)'	[production]
05:41	<ori>	synchronized wmf-config/db-pmtpa.php 'Icdaa4c1b5: Configure parser cache databases in db-$realm file (2/3)'	[production]
05:40	<ori>	synchronized wmf-config/db-eqiad.php 'Icdaa4c1b5: Configure parser cache databases in db-$realm file (1/3)'	[production]
05:37	<ori>	updated /a/common to {{Gerrit\|Icdaa4c1b5}}: Configure parser cache databases in db-$realm file	[production]
03:37	<springle>	synchronized wmf-config/db-eqiad.php 'repool slaves after package upgrade, (lvm snapshot boxes only, LB=0)'	[production]
03:16	<springle>	synchronized wmf-config/db-eqiad.php 'depool slaves for package upgrade'	[production]
02:43	<LocalisationUpdate>	ResourceLoader cache refresh completed at Thu Nov 28 02:42:58 UTC 2013	[production]
02:29	<springle>	synchronized wmf-config/db-eqiad.php 'slaves to full steam after package upgrade'	[production]
02:15	<LocalisationUpdate>	completed (1.23wmf5) at Thu Nov 28 02:15:36 UTC 2013	[production]
02:09	<LocalisationUpdate>	completed (1.23wmf4) at Thu Nov 28 02:09:38 UTC 2013	[production]
01:17	<springle>	synchronized wmf-config/db-eqiad.php 'warm up slaves after package upgrade'	[production]
01:02	<ori-l>	started rsync of graphite data (~400gb) from professor.pmtpa to tungsten.eqiad	[production]
00:40	<springle>	synchronized wmf-config/db-eqiad.php 'depool slaves for package upgrade'	[production]
2013-11-27 §
19:50	<demon>	synchronized wmf-config/InitialiseSettings.php 'Fixes for Flow config, no-op in prod'	[production]
19:49	<demon>	synchronized wmf-config/CommonSettings.php 'Fixes for Flow config, no-op in prod'	[production]
18:12	<paravoid>	kill -9 gdb on cp3012, attached to varnish frontend	[production]
11:28	<ori-l>	faidon switched gdash.wm.o from professor.pmtpa -> tungsten.eqiad behind misc-varnish & rebooted ssl1 in tampa	[production]
11:11	<apergos>	ssl1 rebooted itself about 15 mins ago, no idea why	[production]
10:20	<ariel>	synchronized wmf-config/db-eqiad.php 'db1019 (s3) back to full weight in the pool'	[production]
10:19	<ariel>	updated /a/common to {{Gerrit\|If5ebd6194}}: db1019 (s3) back to full weight in pool	[production]
10:08	<apergos>	shot some old puppet processes hogging memory on db9 (from march and earlier)	[production]
09:49	<apergos>	there was no mount /srv/pagecounts on labstore4, so rsync to /exp/pagecounts wrote to and filled /; did the mkdir and now things seem ok	[production]