production SAL

5101-5150 of 10000 results (44ms)

2016-07-04 §
13:38	<gehel>	resuming writes on Cirrus / elasticsearch, this did not speedup cluster recovery	[production]
13:18	<godog>	bounce redis on rcs1001	[production]
13:16	<gehel>	restarting elastic1021 for kernel upgrade (T138811)	[production]
13:07	<elukey>	Bootstrapping again Cassandra on aqs100[456] (rack awareness + 2.2.6 - testing environment)	[production]
13:02	<gehel>	pausing writes on Cirrus / elasticsearch for faster cluster restart	[production]
12:43	<hashar>	Nodepool back up with 10 instances (instead of 20) to accomodate for labs capacity T139285	[production]
12:39	<godog>	nodetool-b stop -- COMPACTION on restbase1014	[production]
12:29	<moritzm>	rolling reboot of rcs* cluster for kernel security update	[production]
12:10	<moritzm>	rolling reboot of ocg* cluster for kernel security update	[production]
11:40	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Failover db1053 to db1072 (duration: 00m 40s)	[production]
10:56	<moritzm>	rolling reboot of swift frontends in eqiad for kernel security update	[production]
10:30	<yuvipanda>	stop nodepool on labnodepool1001 and disable puppet to keep it down, to allow stabilizing labs first	[production]
10:28	<yuvipanda>	restart rabbitmq-server on labcontrol1001	[production]
10:14	<moritzm>	installing chromium security update on osmium	[production]
10:07	<moritzm>	installing xerces-c security updates on Ubuntu systems (jessie already fixed)	[production]
10:01	<_joe_>	stopping jobchron and jobrunner on mw1001-10 before decommission	[production]
09:50	<godog>	reimage ms-be300[234] with jessie	[production]
09:44	<hashar>	Labs infra cant delete instances anymore (impacts CI as well) T139285	[production]
09:41	<moritzm>	installing p7zip security updates	[production]
09:38	<hashar>	CI is out of Nodepool instances, the pool has drained because instances can no more be deleted over the OpenStack API	[production]
09:25	<elukey>	Added new jobrunners in service - mw130[256].eqiad.wmnet (https://etherpad.wikimedia.org/p/jessie-install)	[production]
08:16	<moritzm>	rolling reboot of swift backends in eqiad for kernel security update	[production]
07:49	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Failover db1034 to db1062 (duration: 00m 30s)	[production]
02:26	<l10nupdate@tin>	ResourceLoader cache refresh completed at Mon Jul 4 02:26:54 UTC 2016 (duration 5m 42s)	[production]
02:21	<mwdeploy@tin>	scap sync-l10n completed (1.28.0-wmf.8) (duration: 09m 14s)	[production]
2016-07-03 §
19:27	<Reedy>	Ran namespaceDupes --fix on gomwiki	[production]
14:59	<yuvipanda>	restart nova-compute process on labvirt1010	[production]
14:59	<yuvipanda>	restart nova-compute process on labvirt10101	[production]
09:06	<jynus>	removing old logs from pc2004	[production]
07:42	<legoktm@tin>	Synchronized static/images/project-logos/: Put high-res enwiktionary logos in the right place - T139255 (duration: 00m 38s)	[production]
02:27	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sun Jul 3 02:27:13 UTC 2016 (duration 5m 38s)	[production]
02:21	<mwdeploy@tin>	scap sync-l10n completed (1.28.0-wmf.8) (duration: 09m 13s)	[production]
2016-07-02 §
19:15	<twentyafterfour>	Deployed hotfix to phabricator. Restarted apache2 on iridium	[production]
02:29	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sat Jul 2 02:29:17 UTC 2016 (duration 5m 40s)	[production]
02:23	<mwdeploy@tin>	scap sync-l10n completed (1.28.0-wmf.8) (duration: 08m 52s)	[production]
2016-07-01 §
22:23	<krinkle@tin>	Synchronized php-1.28.0-wmf.8/extensions/WikimediaEvents/extension.json: T128115 (duration: 00m 37s)	[production]
22:22	<krinkle@tin>	Synchronized php-1.28.0-wmf.8/extensions/WikimediaEvents/modules/: T128115 (duration: 00m 30s)	[production]
21:04	<ori@tin>	Synchronized wmf-config/CommonSettings.php: I7a95c0f4: Bump $wgResourceLoaderMaxQueryLength to 5,000 (duration: 00m 32s)	[production]
20:08	<ori@tin>	Synchronized wmf-config/CommonSettings.php: I6eb0ae67: Bump $wgResourceLoaderMaxQueryLength to 4,000 (duration: 00m 26s)	[production]
19:17	<ori>	restarted coal on graphite1001 stopped receiving messages from EL 0mq publisher	[production]
19:16	<ori>	restarted navtiming on hafnium; stopped receiving messages from EL 0mq publisher	[production]
18:34	<mutante>	mw1259 - powercycling	[production]
18:32	<krinkle@tin>	Synchronized docroot/default/: (no message) (duration: 00m 31s)	[production]
18:31	<krinkle@tin>	Synchronized errorpages/: (no message) (duration: 01m 06s)	[production]
17:47	<ebernhardson>	restart elasticsearch on elastic1017 to attempt to clear up a continuous backlog of relocating shards	[production]
15:53	<godog>	temporarily run 3x statsdlb instances on graphite1001 to minimise drops - T101141	[production]
14:57	<dcausse>	upgraded and restarted elastic on nobelium@eqiad	[production]
14:21	<godog>	enable another statsdlb instance temporarily on graphite1001 to investigate drops	[production]
14:15	<moritzm>	rearmed keyholder on mira after reboot	[production]
13:55	<moritzm>	rebooting codfw poolcounters for kernel update	[production]