production SAL

9601-9650 of 10000 results (39ms)

2013-11-12 §
16:30	<cmjohnson1>	swapping ps1-c7-eqiad...one side at a time...notifications pending	[production]
14:17	<springle>	paused externallinks OSC jobs after replication glitch on dewiki. original table and data remain untouched	[production]
11:33	<mark>	Manually set dirty_background_ratio to 5 (from 10) on amssq58	[production]
11:30	<hashar>	synchronized wmf-config/InitialiseSettings.php 'thottle rule for WikiCon {{gerrit\|94860}}'	[production]
11:30	<hashar>	synchronized wmf-config/throttle.php 'thottle rule for WikiCon {{gerrit\|94860}}'	[production]
09:45	<mark>	rebooting amssq58 with sysrq-trigger	[production]
07:02	<springle>	synchronized wmf-config/db-eqiad.php 'recache jobs on S2 to db1018'	[production]
03:34	<springle>	synchronized wmf-config/db-eqiad.php 'slave balancing'	[production]
03:12	<LocalisationUpdate>	ResourceLoader cache refresh completed at Tue Nov 12 03:12:05 UTC 2013	[production]
02:20	<LocalisationUpdate>	completed (1.23wmf2) at Tue Nov 12 02:20:54 UTC 2013	[production]
02:15	<LocalisationUpdate>	completed (1.23wmf3) at Tue Nov 12 02:15:03 UTC 2013	[production]
01:27	<springle>	restart db1050 mariadb after outage, let repl catch up. new lvm snaps mount ok. leave out of pool for now	[production]
2013-11-11 §
21:32	<paravoid>	rebooting ms-be1003, kernel bug, system CPU & I/O wait through the roof	[production]
20:08	<Nemo_bis>	network level of bits application servers eqiad is back to the pre-deploy 18+8 MB/s	[production]
19:56	<reedy>	synchronized php-1.23wmf3/extensions/Wikibase 'https://gerrit.wikimedia.org/r/94790'	[production]
19:41	<Reedy>	Switching 588 wikis to 1.23wmf3 in one go seems to have upset the bits appserver pool	[production]
19:26	<reedy>	rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.23wmf3	[production]
19:23	<reedy>	updated /a/common to {{Gerrit\|Ibab846000}}: Bump wmgMemoryLimit to 210MB	[production]
17:40	<reedy>	synchronized php-1.23wmf3/extensions/FlaggedRevs 'https://gerrit.wikimedia.org/r/94774'	[production]
16:22	<reedy>	synchronized wmf-config/InitialiseSettings.php 'Bump wmgMemoryLimit to 210MB'	[production]
16:21	<pp-pdf1>	restarted all services	[production]
16:20	<pp-pdf1>	- upgrade mwlib to 0.15.12	[production]
16:13	<paravoid>	upgrading image/video scalers	[production]
15:32	<mark>	Moved eqiad wikipedia traffic onto Varnish	[production]
15:06	<mark>	Moved eqiad https/ipv6 text traffic from squid to varnish	[production]
14:31	<mark>	rebooting amssq54, XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)	[production]
08:50	<pp-pdf1>	restarted all services	[production]
08:50	<pp-pdf1>	- update mwlib.rl to 0.14.4.	[production]
06:43	<ori-l>	scap: "sudo: no tty present and no askpass program specified" for snapshot1 & snapshot4	[production]
06:39	<apergos>	probably gratuitous powercycle of sq80, it seems fine now in any case	[production]
06:33	<ori>	Finished syncing Wikimedia installation... :	[production]
06:32	<apergos>	sq48 repeat of these errors and hung again, so rt #6274 opened	[production]
06:28	<apergos>	powercycled hung sq48, took two tries to come up, "NMI received for unknown reason 31 on CPU 0" and "mptbase: ioc0: ERROR - Failed to come READY after reset"	[production]
06:28	<ori>	Started syncing Wikimedia installation... :	[production]
02:40	<LocalisationUpdate>	ResourceLoader cache refresh completed at Mon Nov 11 02:40:33 UTC 2013	[production]
02:29	<ori-l>	Apache logs filled with "SearchPhaseExecutionException[Failed to execute phase [dfs], all shards failed"	[production]
02:15	<ori-l>	Continuing inspection of logs on fluorine. memcached-serious.log is flooded with 'Memcached error for key [...]' errors, problem started in May or June judging by log sizes.	[production]
02:14	<LocalisationUpdate>	completed (1.23wmf3) at Mon Nov 11 02:14:09 UTC 2013	[production]
02:08	<LocalisationUpdate>	completed (1.23wmf2) at Mon Nov 11 02:07:59 UTC 2013	[production]
02:04	<ori-l>	Earlier issue identified by Ryan and Leslie as intermittent packet loss between eqiad and esams, due to capacity issue with provider.	[production]
2013-11-10 §
23:42	<ori-l>	CPU overload in text caches esams	[production]
23:42	<ori-l>	Per Ryan: packet loss from esams to eqiad on xe-4-2-2.cr1-eqiad.wikimedia.org	[production]
23:26	<ori-l>	redis.log: flooded with 'Used automatic re-authentication for Lua script [...]' (68,955 such messages)	[production]
23:24	<ori-l>	fatal log: Fatal error: Allowed memory size of 201326592 bytes exhausted (tried to allocate 72 bytes) at /usr/local/apache/common-local/php-1.23wmf2/extensions/WikibaseDataModel/DataModel/Entity/Entity.php on line 130 (47 such fatals)	[production]
23:21	<ori-l>	poolcounter.log: Pool counter is full (multiple wikis)	[production]
23:17	<ori-l>	exception.log: Exception from line 110 of /usr/local/apache/common-local/php-1.23wmf2/includes/WikiPage.php: Invalid or virtual namespace -1 given. (2 such errors)	[production]
23:14	<ori-l>	exception.log: Exception from line 114 of /usr/local/apache/common-local/php-1.23wmf2/includes/upload/UploadStash.php: UploadStash::getFile No user is logged in, files must belong to users (8 such errors)	[production]
23:14	<ori-l>	exception.log: Exception from line 61 of /usr/local/apache/common-local/php-1.23wmf2/includes/media/ImageHandler.php: No width specified to ImageHandler::makeParamString (194 such errors)	[production]
23:13	<ori-l>	Investigating possible site issue and logging everything that I come across.	[production]
02:52	<LocalisationUpdate>	ResourceLoader cache refresh completed at Sun Nov 10 02:52:22 UTC 2013	[production]