production SAL

3651-3700 of 10000 results (40ms)

2016-11-27 §
09:35	<elukey>	removed all the files not used in /tmp on stat1002 after a follow up with the owner	[production]
06:20	<ori@tin>	Synchronized php-1.29.0-wmf.3/api.php: Bandaid: make API reqs fail fast if User-Agent ~= Parsoid and Host ~= eu.wikipedia.org (duration: 00m 50s)	[production]
05:36	<ori>	Commented-out lived-hack from mw1290; if we see memory growth now, Parsoid would be strongly implicated.	[production]
05:33	<ori>	With Parsoid requests hacked to fail fast, mw1290 is not showing the kind of aggressive growth in memory usage we're seeing on other API servers	[production]
05:30	<godog>	roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory	[production]
05:21	<ori>	Live-hacked api.php on mw1290 to die if request user-agent contains 'Parsoid'; restarted HHVM.	[production]
05:17	<godog>	roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory	[production]
04:57	<godog>	roll-restart hhvm on api_appcluster for on machines with hhvm leaking memory	[production]
03:22	<godog>	roll-restart hhvm across api_appserver	[production]
02:41	<godog>	dumping hhvm backtraces and roll-restart on affected api machines	[production]
02:00	<l10nupdate@tin>	LocalisationUpdate failed: git pull of core failed	[production]
2016-11-26 §
15:35	<elukey>	deleted tmp files on stat1002's /tmp partition because of disk space consumption. Will follow up with the owner.	[production]
13:36	<Krenair>	ran refreshLinks on angwiki for T151584, it ran into issues with the EventBus extension at the links tables step	[production]
12:29	<volans>	manually fixed the checkout of mediawiki core on stat1002 and stat1003 that was causing Puppet failing	[production]
02:22	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sat Nov 26 02:22:26 UTC 2016 (duration 4m 18s)	[production]
02:18	<l10nupdate@tin>	scap sync-l10n completed (1.29.0-wmf.3) (duration: 06m 28s)	[production]
2016-11-25 §
20:09	<Krinkle>	mwscript deleteEqualMessages.php --wiki angwiki (T45917)	[production]
17:15	<jynus>	drop database vewikimedia (deleted wiki) from sanitarium and its slaves	[production]
14:22	<Reedy>	delete oathauth row on wikitech for user Liuxinyu970226 per T144805	[production]
14:16	<Reedy>	delete oathauth row on wikitech for user Shoichi per T144805	[production]
11:05	<ema>	uploaded libvmod-{netmapper,tbf,vslp} to carbon main component (T150660)	[production]
10:20	<_joe_>	upgrading HHVM across codfw	[production]
09:23	<_joe_>	upgraded hhvm on the debug hosts	[production]
08:58	<_joe_>	uploading hhvm_3.12.7+dfsg-1+wmf4 to apt	[production]
08:53	<volans>	restarting zotero on sca1003, almost out of RAM, puppet failing	[production]
08:52	<elukey>	restarting Yarn and HDFS masters on analytics100[12] (Hadoop cluster) to complete the openjdk update	[production]
07:51	<marostegui>	Stopping replication db1052 for maintenance - T151607	[production]
02:22	<l10nupdate@tin>	ResourceLoader cache refresh completed at Fri Nov 25 02:22:40 UTC 2016 (duration 4m 20s)	[production]
02:18	<l10nupdate@tin>	scap sync-l10n completed (1.29.0-wmf.3) (duration: 06m 48s)	[production]
2016-11-24 §
17:25	<_joe_>	turned off additional workers for htmlcacheupdate on commonswiki as the queue has reduced to acceptable sizes (T151196)	[production]
15:03	<ema>	uploaded varnish 4.1.3-1wm4 to carbon main component, replacing version 3.0.6plus-wm9 (T150660)	[production]
14:47	<ema>	uploaded varnishkafka 1.0.12-1 to carbon main component, replacing version 1.0.7-1 (T150660)	[production]
13:31	<akosiaris>	balance the load between thumbor1001 and thumbor1002 evenly	[production]
13:31	<akosiaris@puppetmaster1001>	conftool action : set/weight=10; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor'])	[production]
13:20	<akosiaris@puppetmaster1001>	conftool action : set/weight=5; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor'])	[production]
13:04	<akosiaris@puppetmaster1001>	conftool action : set/weight=20; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor'])	[production]
12:54	<gilles>	restarting thumbor on thumbor1001	[production]
12:49	<akosiaris>	lower thumbor1001 load by 50% to easy debugging	[production]
12:48	<gilles>	restarting thumbor on thumbor1001	[production]
12:48	<akosiaris@puppetmaster1001>	conftool action : set/weight=5; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor'])	[production]
12:36	<elukey>	launched preferred-replica-election to re-add kafka1022 among the Topic partition leader brokers of the Analytics Kafka cluster (all metrics looks good)	[production]
11:41	<hoo>	Killed the Wikidata JSON dump creation on snapshot1007: Wont succeed before Monday, due to T151356	[production]
10:13	<_joe_>	running commonswiki htmlCacheUpdate jobs on terbium to catch up with the backlog, monitoring caches for vhtcpd queue overflows T151196	[production]
09:38	<marostegui>	Stopping replication db1052 (depooled) for maintenance - T150960	[production]
08:59	<marostegui>	Deploy alter table S5 - dewiki.revision on db1092 (depooled) - T148967	[production]
08:15	<_joe_>	uploaded calico-cni 1.5.1 to jessie-wikimedia	[production]
07:32	<marostegui>	Stopping MySQL db2070 for maintenance - https://phabricator.wikimedia.org/T149553	[production]
02:35	<l10nupdate@tin>	ResourceLoader cache refresh completed at Thu Nov 24 02:35:10 UTC 2016 (duration 5m 15s)	[production]
02:29	<l10nupdate@tin>	scap sync-l10n completed (1.29.0-wmf.3) (duration: 10m 39s)	[production]
00:28	<reedy@tin>	Synchronized php-1.29.0-wmf.3/extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php: Some perf related improvements (duration: 00m 45s)	[production]