production SAL

9601-9650 of 10000 results (57ms)

2016-02-18 §
18:04	<mobrovac>	restbase deploy end of a42976cc82	[production]
18:03	<twentyafterfour>	applied a hotfix from https://secure.phabricator.com/D15306 on iridium to test a fix for https://phabricator.wikimedia.org/T127290	[production]
18:00	<godog>	reenable puppet on restbase1008	[production]
17:49	<mobrovac>	restbase deploy start of a42976cc82	[production]
17:47	<elukey>	manual failover of hadoop master node (analytics1001) to secondary (analytics1002) for maintenance (plus service restarts)	[production]
17:41	<urandom>	upgrading Cassandra to 2.1.13 on cerium.eqiad.wmnet (restbase staging) T126629	[production]
17:28	<mobrovac>	restbase deploying a42976cc82 to restbase1002	[production]
17:27	<urandom>	Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence?): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child : T126629	[production]
17:26	<urandom>	Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child	[production]
17:21	<urandom>	upgrading Cassandra to 2.1.13 on xenon.eqiad.wmnet (restbase staging) T126629	[production]
17:20	<elukey>	disabled puppet on analytics1027 to avoid any Camus job to run	[production]
17:04	<dcausse>	updating completion suggester indices in eqiad	[production]
16:54	<elukey>	restarting hadoop services on analytics105* nodes for security updates	[production]
16:49	<gehel>	removing cirrus maintenance crons from mw1152 (T127322)	[production]
15:52	<dcausse>	creating adywiki indices in codfw	[production]
15:44	<elukey>	restarting hadoop services on analytics104* nodes for security updates	[production]
15:37	<elukey>	restarting hadoop services on analytics102* nodes for security update	[production]
15:33	<moritzm>	restarting apache on silver/wikitech	[production]
15:10	<elukey>	restarting hadoop services on analytics103* hosts for security upgrades	[production]
14:06	<bblack>	restarting apache on gallium (integration)	[production]
13:13	<mark>	decreased raid md2 sync_speed_max to 6000 on restbase1008	[production]
12:55	<elukey>	rebooted kafka1022.eqiad.wmnet for kernel upgrade	[production]
12:51	<godog>	decrease raid min_speed to 8000 on restbase1008	[production]
12:50	<hoo@tin>	Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for Wikidata (duration: 01m 54s)	[production]
12:41	<elukey>	rebooted kafka1020 for kernel upgrade.	[production]
12:40	<godog>	decrease raid min_speed to 10000 on restbase1008	[production]
12:24	<godog>	increase stripe_cache_size to 32470 on restbase1008	[production]
12:21	<godog>	expand raid0 on restbase1008 to sdd and sde	[production]
11:36	<paravoid>	upgrading mr1-ulsfo to its pre-recovery version and rebooting (T127295)	[production]
11:34	<hashar>	Hard restarting Jenkins T127294	[production]
11:32	<jynus>	logical import of db1021 starting for data consistency check and defragmenting purposes	[production]
11:29	<paravoid>	mr1-ulsfo: "request system snapshot media internal slice alternate" + reboot (T127295)	[production]
11:27	<hashar>	Jenkins web UI busy with 'jenkins.model.RunIdMigrator doMigrate' while it migrate build records. I did a bunch of cleanup yesterday. Jenkins runs jobs in the background just fine though. T127294	[production]
11:12	<hashar>	Jenkins: reloading configuration from disk. Some metadata are corrupted T127294	[production]
10:48	<elukey>	rebooted kafka1018 for maintenance	[production]
10:17	<elukey>	rebooted kafka1014 for maintenance	[production]
10:10	<moritzm>	restarting hhvm on mw1* to put glibc update into effect	[production]
09:49	<godog>	remove old restbase metrics under restbase.* from graphite1001 and graphite2001	[production]
03:13	<twentyafterfour>	running puppet one last time on iridium. Phabricator upgrade successful with just a few minor issues now resolved.	[production]
03:01	<l10nupdate@tin>	ResourceLoader cache refresh completed at Thu Feb 18 03:01:01 UTC 2016 (duration 9m 24s)	[production]
02:51	<mwdeploy@tin>	sync-l10n completed (1.27.0-wmf.14) (duration: 11m 20s)	[production]
02:29	<mwdeploy@tin>	sync-l10n completed (1.27.0-wmf.13) (duration: 13m 55s)	[production]
02:18	<twentyafterfour>	phabricator is back online, sprint extension is broken, I'm investigating	[production]
01:57	<mutante>	powercycled frozen mw1147	[production]
01:51	<twentyafterfour>	phab pre-upgrade: http://pastebin.com/RTmXfDhp	[production]
01:49	<twentyafterfour>	about to bring down phabricator to do the upgrade	[production]
01:49	<twentyafterfour>	ran puppet on iridium for testing	[production]
01:08	<twentyafterfour>	stopped phd and started dumping phabricator's database to /srv/dumps/20160218.phabricator.sql.gz (just in case I need to roll back the update)	[production]
00:34	<catrope@tin>	Synchronized php-1.27.0-wmf.13/extensions/Flow: Trying again (duration: 01m 50s)	[production]
00:28	<RoanKattouw>	00:28:25 64 apaches had sync errors , /usr/bin/sync-common missing	[production]