production SAL

251-300 of 10000 results (31ms)

2016-07-05 §
13:56	<elukey>	pooling new codfw appservers - mw224[12345]	[production]
12:32	<elukey@palladium>	conftool action : set/pooled=yes; selector: mw1024.eqiad.wmnet	[production]
12:12	<elukey@palladium>	conftool action : set/pooled=no; selector: mw1024.eqiad.wmnet	[production]
12:11	<elukey>	depooling/re-pooling mw1024.eqiad.wmnet to temporarily set up trace8 logging (503 investigation - T73487)	[production]
12:08	<jynus>	running schema change on db1019 T73563	[production]
11:15	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Failover all commons special roles to db1081 (duration: 00m 24s)	[production]
11:00	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Failover commons recentachanges (duration: 00m 36s)	[production]
10:45	<jynus>	SET GLOBAL read_only=0; on db1040, our new m4-master	[production]
10:38	<jynus@tin>	Synchronized wmf-config/db-eqiad.php: Failover commons master to db1040 (duration: 00m 32s)	[production]
10:23	<jynus>	archiving m3-master phlegal* databases before dropping them	[production]
10:20	<mobrovac>	restbase staging started a no-op dump on cerium to test restbase on node 4.4.6	[production]
10:05	<elukey@palladium>	conftool action : set/weight=30; selector: mw1275.eqiad.wmnet	[production]
10:05	<elukey@palladium>	conftool action : set/weight=30; selector: mw1274.eqiad.wmnet	[production]
10:05	<elukey@palladium>	conftool action : set/weight=30; selector: mw1273.eqiad.wmnet	[production]
09:59	<elukey@palladium>	conftool action : set/weight=30; selector: mw1272.eqiad.wmnet	[production]
09:31	<_joe_>	shutting down mw1009-16 for decommissioning	[production]
09:06	<_joe_>	decommissioning mw1009-16	[production]
08:38	<elukey@palladium>	conftool action : set/pooled=yes; selector: mw1275.eqiad.wmnet	[production]
08:36	<elukey@palladium>	conftool action : set/pooled=yes; selector: mw1274.eqiad.wmnet	[production]
08:32	<gehel>	deleting enwikisource_titlesuggest on elasticsearch codfw (index creation issue during cluster restart)	[production]
08:31	<elukey@palladium>	conftool action : set/pooled=yes; selector: mw1273.eqiad.wmnet	[production]
08:23	<elukey@palladium>	conftool action : set/pooled=yes; selector: mw1272.eqiad.wmnet	[production]
08:21	<elukey>	adding and pooling new appservers - mw127[2345].eqiad	[production]
08:07	<godog>	swift codfw-prod: ms-be202[567] weight 1500	[production]
07:55	<jynus>	dropping etherpad_restore2 database from m1 T138516	[production]
07:40	<akosiaris>	T138516 forcing a puppet run on cache::misc hosts after merging https://gerrit.wikimedia.org/r/297352	[production]
07:29	<akosiaris>	T138516 stop the secondary etherpad instance on etherpad1001. etherpad-restore.wikimedia.org has served its purpose, killing it	[production]
02:44	<l10nupdate@tin>	ResourceLoader cache refresh completed at Tue Jul 5 02:44:09 UTC 2016 (duration 6m 12s)	[production]
02:37	<mwdeploy@tin>	scap sync-l10n completed (1.28.0-wmf.8) (duration: 17m 13s)	[production]
2016-07-04 §
20:28	<jynus>	removing /tmp/joal/sstables on all analytics10* hosts	[production]
20:22	<jynus>	deleted 21GB worth of temporary files from analytics1050	[production]
19:58	<aaron@tin>	Synchronized wmf-config/filebackend-production.php: Increase redis lockmanager timeout to 2 (duration: 00m 31s)	[production]
19:57	<legoktm@tin>	Synchronized php-1.28.0-wmf.8/extensions/MassMessage/: MassMessage is no longer accepting lists in the MassMessageList content model - T139303 (duration: 00m 39s)	[production]
17:37	<jynus>	testing slave_parallel_threads=5 on db1073	[production]
14:27	<moritzm>	rebooting lithium for kernel update	[production]
14:22	<moritzm>	installing tomcat7/ libservlet3.0-java security update on the kafka brokers	[production]
14:06	<_joe_>	shutting down mw1001-1008 for decommissioning	[production]
14:03	<gehel>	rolling restart of elasticsearch codfw/eqiad for kernel upgrade (T138811)	[production]
13:47	<_joe_>	stopping jobrunner on mw1011-16 as well, befor decommissioning	[production]
13:46	<moritzm>	depooling mw1153-mw1160 (trusty image scalers), replaced by mw1291-mw1298 (jessie image scalers)	[production]
13:44	<godog>	ack all mr1-codfw related alerts in librenms	[production]
13:43	<akosiaris>	restart smokeping on netmon1001, temporarily disabled msw1-codfw	[production]
13:38	<gehel>	resuming writes on Cirrus / elasticsearch, this did not speedup cluster recovery	[production]
13:18	<godog>	bounce redis on rcs1001	[production]
13:16	<gehel>	restarting elastic1021 for kernel upgrade (T138811)	[production]
13:07	<elukey>	Bootstrapping again Cassandra on aqs100[456] (rack awareness + 2.2.6 - testing environment)	[production]
13:02	<gehel>	pausing writes on Cirrus / elasticsearch for faster cluster restart	[production]
12:43	<hashar>	Nodepool back up with 10 instances (instead of 20) to accomodate for labs capacity T139285	[production]
12:39	<godog>	nodetool-b stop -- COMPACTION on restbase1014	[production]
12:29	<moritzm>	rolling reboot of rcs* cluster for kernel security update	[production]