production SAL

1301-1350 of 10000 results (32ms)

2014-10-27 §
19:39	<manybubbles>	after restarting elasticsearch we expected to get memory errors again. no such luck so far....	[production]
18:57	<manybubbles>	completed restarting elasticsearch cluster. now it'll make a useful file on out of memory errors. raised the recovery throttling so it'll recover fast enough to cause oom errors	[production]
18:48	<maxsem>	Synchronized php-1.25wmf4/extensions/GeoData: live hack to disable geosearch (duration: 00m 04s)	[production]
18:37	<manybubbles>	note that this is a restart without waiting for the cluster to go green after each restart. I expect lots of whining from icinga. This will cause us to lose some updates but should otherwise be safe.	[production]
18:34	<manybubbles>	restarting elasticsearch servers to pick up new gc logging and to reset them into a "working" state so they can have their gc problem again and we can log it properly this time.	[production]
18:15	<aaron>	Synchronized wmf-config/CommonSettings.php: Remove obsolete flags (all of them) from $wgAntiLockFlags (duration: 00m 07s)	[production]
17:53	<cmjohnson>	replacing disk /dev/sdl slot 11 ms-be1013	[production]
17:37	<_joe_>	uploaded a version of jemalloc for trusty with --enable-prof	[production]
16:31	<^d>	elasticsearch: temporarily raised node_concurrent_recoveries from 3 to 5.	[production]
15:32	<demon>	Synchronized wmf-config/InitialiseSettings.php: Enable Cirrus as secondary everywhere, brings back GeoData (duration: 00m 04s)	[production]
15:08	<manybubbles>	Its unclear how much of the master going haywire is something that'll be fixed in elasticsearch 1.4. They've done a lot of work there on the cluster state communication.	[production]
15:03	<manybubbles>	restarting gmond on all elasticsearch systems because stats aren't updating properly in ganglia and usually that helps	[production]
15:02	<manybubbles>	restarted a bunch of the elasticsearch nodes that had their heap full. wasn't able to get a heap dump on any of them because they all froze while trying to get the heap dump.	[production]
14:32	<^d>	elasticsearch: disabling replica allocation, less things moving about if we restart cluster	[production]
13:47	<manybubbles>	Synchronized wmf-config/InitialiseSettings.php: fall back to lsearchd for a bit (duration: 00m 05s)	[production]
13:41	<manybubbles>	Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)	[production]
13:29	<manybubbles>	restarted elasticsearch on elastic1017 - memory was totally full there	[production]
13:21	<manybubbles>	elastic1008 is logging gc issues. restarting it because that might help it	[production]
05:04	<springle>	forced logrotate ocg1001	[production]
03:36	<LocalisationUpdate>	ResourceLoader cache refresh completed at Mon Oct 27 03:36:39 UTC 2014 (duration 36m 38s)	[production]
02:27	<LocalisationUpdate>	completed (1.25wmf5) at 2014-10-27 02:27:45+00:00	[production]
02:17	<LocalisationUpdate>	completed (1.25wmf4) at 2014-10-27 02:17:08+00:00	[production]
2014-10-26 §
23:46	<Krinkle>	Force restarted Zuul	[production]
15:14	<Guest19240>	Jenkins/Zuul is stuck as of 20 hours ago	[production]
15:06	<_joe_>	restarted hhvm on mw1114, memory nearly exhausted	[production]
03:36	<LocalisationUpdate>	ResourceLoader cache refresh completed at Sun Oct 26 03:36:20 UTC 2014 (duration 36m 19s)	[production]
02:25	<LocalisationUpdate>	completed (1.25wmf5) at 2014-10-26 02:25:47+00:00	[production]
02:15	<LocalisationUpdate>	completed (1.25wmf4) at 2014-10-26 02:15:12+00:00	[production]
2014-10-25 §
22:49	<paravoid>	upgrading JunOS on cr1-ulsfo	[production]
22:32	<paravoid>	scheduling downtime for all ulsfo -lb- & cr1/2-ulsfo	[production]
21:30	<ori>	Synchronized php-1.25wmf5/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)	[production]
21:30	<ori>	Synchronized php-1.25wmf4/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)	[production]
20:41	<bd808>	updated logstash-* labs instances to salt minion 2014.1.11 (thanks for the ping apergos)	[production]
03:46	<LocalisationUpdate>	ResourceLoader cache refresh completed at Sat Oct 25 03:46:48 UTC 2014 (duration 46m 47s)	[production]
02:29	<LocalisationUpdate>	completed (1.25wmf5) at 2014-10-25 02:29:29+00:00	[production]
02:18	<LocalisationUpdate>	completed (1.25wmf4) at 2014-10-25 02:18:14+00:00	[production]
00:27	<awight>	updated DjangoBannerStats from cf5a875d49f4c4cf229d7f864a73d4c2f588ebf9 to a3038f133d64c737d3987bd1c37a987fd3003dd6	[production]
2014-10-24 §
22:40	<akosiaris>	puppet disabled on uranium, do not enable	[production]
20:52	<andrewbogott>	revived virt1006 on a probationary basis. It's running compute but is disabled so new instances won't be scheduled there. I've moved a few test instances there to see how it behaves.	[production]
20:36	<andrew>	Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)	[production]
20:29	<Reedy>	sync-common on mw1088	[production]
20:23	<mutante>	mw1088 - gzipping core dump files, disabled core dumps, restarted apache	[production]
20:15	<mutante>	mw1088 - gzip other_vhosts_access.log.1 - Avail. 38G	[production]
20:15	<Reedy>	/ full on mw1088 due to apache core dumps	[production]
20:09	<Reedy>	running sync-common on mw1041	[production]
20:04	<mutante>	powercycled mw1041	[production]
20:03	<reedy>	Synchronized php-1.25wmf5/extensions/SemanticForms/: noop for prod (duration: 00m 17s)	[production]
20:01	<Reedy>	mw1041 is down	[production]
20:01	<Reedy>	mw1088 has a full /	[production]
20:00	<reedy>	Synchronized php-1.25wmf4/extensions/SemanticForms/: noop for prod (duration: 00m 16s)	[production]