__all__ SAL

4401-4450 of 10000 results (32ms)

2015-09-01 §
13:09	<moritzm>	enabled ferm on labsdb100[467]	[production]
12:01	<YuviPanda>	disable puppet on labsdb1006	[production]
08:58	<moritzm>	enabled ferm on labsdb1001	[production]
08:58	<godog>	fixup current graphite retention for metrics under "servers" hierarchy T96662	[production]
08:51	<moritzm>	enabled ferm on labsdb1002	[production]
08:31	<moritzm>	enabled ferm on labsdb1003	[production]
08:29	<godog>	repool mw1125 mw1142 after nutcracker failures	[production]
07:45	<jynus>	cloning mysql data from es1010 to es1017 [ETA: 6h]	[production]
07:23	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Depool es1010 (duration: 00m 12s)	[production]
07:13	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Repool es1007, pool es1013 (duration: 00m 13s)	[production]
06:36	<mutante>	uploaded survey2012 to dumps/dataset1001; ownership as it is for survey2011; - T110746 in time for midnight PST	[production]
06:23	<valhallasw`cloud>	seems to have worked. SGE :(	[tools]
06:17	<valhallasw`cloud>	going to restart sge_qmaster, hoping this solves the issue :/	[tools]
06:07	<valhallasw`cloud>	e.g. "queue instance "task@tools-exec-1211.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=1.820000 (= 0.070000 + 0.50 * 14.000000 with nproc=4) >= 1.75" but the actual load is only 0.3?!	[tools]
06:06	<valhallasw`cloud>	test job does not get submitted because all queues are overloaded?!	[tools]
06:06	<valhallasw`cloud>	investigating SGE issues reported on irc/email	[tools]
05:18	<l10nupdate@tin>	ResourceLoader cache refresh completed at Tue Sep 1 05:18:09 UTC 2015 (duration 18m 8s)	[production]
02:28	<l10nupdate@tin>	LocalisationUpdate completed (1.26wmf20) at 2015-09-01 02:28:30+00:00	[production]
02:25	<l10nupdate@tin>	Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 06m 00s)	[production]
01:12	<James_F>	Re-restarting grrrit-wm rolled back to 2f5de55ff75c3c268decfda7442dcdd62df0a42d	[tools.lolrrit-wm]
01:12	<James_F>	Re-restarting grrrit-wm rolled back to 2f5de55ff75c3c268decfda7442dcdd62df0a42d	[releng]
00:54	<James_F>	Restarted grrrit-wm with I7eb67e3482 as well as I48ed549dc2b.	[releng]
00:32	<James_F>	Didn't work, rolled back grrrit-wm to 2f5de55ff75c3c268decfda7442dcdd62df0a42d.	[releng]
00:32	<James_F>	Didn't work, r	[releng]
00:29	<James_F>	Restarted grrrit-wm for I48ed549dc2b.	[releng]
2015-08-31 §
23:56	<krenair@tin>	Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/233665/ (duration: 00m 11s)	[production]
23:49	<ebernhardson@tin>	Synchronized wmf-config/InitialiseSettings.php: reenable config changes for cirrus experimental completion api (duration: 00m 12s)	[production]
23:40	<ori@tin>	Synchronized php-1.26wmf20/extensions/EducationProgram: 97ab82eab2: Updated mediawiki/core Project: mediawiki/extensions/EducationProgram 85a7d3932c1a4ad28f1a8dd05704f4e524152349 (duration: 00m 14s)	[production]
23:27	<ebernhardson@tin>	Synchronized php-1.26wmf20/extensions/CirrusSearch/: (no message) (duration: 00m 12s)	[production]
23:25	<ebernhardson@tin>	Synchronized wmf-config/InitialiseSettings.php: revert update for cirrussearch experimental suggestions api (duration: 00m 12s)	[production]
23:21	<ebernhardson@tin>	Synchronized wmf-config/InitialiseSettings.php: update config of cirrussearch experimental suggestions api (duration: 00m 12s)	[production]
22:45	<chasemp>	disabled puppet on elastic hosts temporarily to safely roll out fw change. elastic seems to have not taken it well and I'm holding for green cluster state.	[production]
21:21	<valhallasw`cloud>	webservice: error: argument server: invalid choice: 'generic' (choose from 'lighttpd', 'tomcat', 'uwsgi-python', 'nodejs', 'uwsgi-plain') (for tools.javatest)	[tools]
21:20	<mutante>	installing package upgrades on argon	[production]
21:20	<valhallasw`cloud>	restarted webservicemonitor	[tools]
21:19	<valhallasw`cloud>	seems to have some errors in restarting: subprocess.CalledProcessError: Command '['/usr/bin/sudo', '-i', '-u', 'tools.javatest', '/usr/local/bin/webservice', '--release', 'trusty', 'generic', 'restart']' returned non-zero exit status 2	[tools]
21:18	<valhallasw`cloud>	running puppet agent -tv on tools-services-02 to make sure webservicemonitor is running	[tools]
21:15	<valhallasw`cloud>	several webservices seem to actually have not gotten back online?! what on earth is going on.	[tools]
21:10	<valhallasw`cloud>	some jobs still died (including tools.admin). I'm assuming service.manifest will make sure they start again	[tools]
20:58	<ori>	imported pybal_1.08_amd64.changes to jessie-wikimedia	[production]
20:44	<chasemp>	ferm for elastic100[4-7] and adjust ferm to include wikitech source	[production]
20:29	<valhallasw`cloud>	\|sort is not so spread out in terms of affected hosts because a lot of jobs were started on lighttpd-1409 and -1410 around the same time.	[tools]
20:25	<valhallasw`cloud>	ca 500 jobs @ 5s/job = approx 40 minutes	[tools]
20:23	<valhallasw`cloud>	doh. accidentally used the wrong file, causing restarts for another few uwsgi hosts. Three more jobs dead sigh	[tools]
20:21	<valhallasw`cloud>	now doing more rescheduling, with 5 sec intervals, on a sorted list to spread load between queues	[tools]
20:21	<subbu>	deployed parsoid version c3e4df5e	[production]
19:36	<valhallasw`cloud>	last restarted job is 1423661, rest of them are still in /home/valhallaw/webgrid_jobs	[tools]
19:35	<valhallasw`cloud>	one per second still seems to make SGE unhappy; there's a whole set of jobs dying, mostly uwsgi?	[tools]
19:31	<valhallasw`cloud>	https://phabricator.wikimedia.org/T110861 : rescheduling 521 webgrid jobs, at a rate of one per second, while watching the accounting log for issues	[tools]
16:22	<godog>	depool mw1125 + mw1142 from api, nutcracker client connections exceeded	[production]