production SAL

6051-6100 of 10000 results (40ms)

2017-07-05 §
17:41	<apergos>	re-enabled puppet on stat1003 (last dataset nfs client), manually mounted /mnt/data because puppet run has an unrelated error	[production]
16:33	<jynus>	restart mysql on db2062	[production]
16:04	<ema>	restart pybal on lvs200[12] to make them reconnect to conf2001	[production]
16:03	<ema>	restart pybal on lvs200[45] to make them reconnect to conf2001	[production]
15:54	<jynus>	restart mysql on db2072	[production]
15:30	<apergos>	re-enabled puppet on stat1002, did a manual run, dataset filesystem available again there	[production]
15:09	<apergos>	re-enabled puppet on snapshot6,7, still watching dataset1001 performance	[production]
15:09	<ema>	restart pybal on lvs2003 to make it reconnect to conf2001	[production]
14:45	<ema>	bounce pybal on lvs2006, not synced with etcd information	[production]
14:40	<moritzm>	rebooting restbase1012 for kernel update	[production]
14:19	<moritzm>	rebooting logstash100[4-6] for kernel update	[production]
14:00	<moritzm>	rebooting logstash100[1-3] for kernel update	[production]
13:59	<ema>	cache_misc: upgrade to varnish 4.1.7-1wm1 and reboot for kernel update	[production]
13:48	<apergos>	re-enabling puppet on snapshot1001, 1005 for testing	[production]
13:46	<moritzm>	rebooting restbase1011 for kernel update	[production]
13:44	<zeljkof>	EU SWAT finished!	[production]
13:43	<zfilipin@tin>	Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:362986\|Set Wikibase readFullEntityIdColumn setting to false]] (duration: 00m 42s)	[production]
13:35	<zfilipin@tin>	Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:363043\|Enable WikiLove for ckbwiki (T169563)]] (duration: 00m 43s)	[production]
13:24	<zfilipin@tin>	Synchronized dblists/closed.dblist: SWAT: [[gerrit:361686\|Reopen nlwikinews (T168764)]] (duration: 02m 50s)	[production]
13:21	<jmm@puppetmaster1001>	conftool action : set/pooled=inactive; selector: mw1196.eqiad.wmnet	[production]
13:18	<apergos>	power cycled dataset1001, crashed, unresponsivle on mgmt console	[production]
13:18	<zfilipin@tin>	Synchronized dblists/closed.dblist: SWAT: [[gerrit:361686\|Reopen nlwikinews (T168764)]] (duration: 02m 50s)	[production]
13:16	<elukey>	reboot conf2001 for kernel updates	[production]
13:09	<moritzm>	rebooting restbase1010 for kernel update	[production]
12:49	<marostegui>	Force BBU relearn on db1016 - T166344	[production]
12:36	<marostegui>	Move labsdb1010 main general replication thread to a named replication thread called db1095 - T153743	[production]
12:33	<marostegui>	Stop all replication threads on db1095 for maintenance - T153743	[production]
12:32	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1085 - T153743 (duration: 02m 49s)	[production]
12:29	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1051 - T168661 (duration: 02m 50s)	[production]
12:16	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1051 - T168661 (duration: 02m 51s)	[production]
12:11	<apergos>	puppet is currently disabled again on snapshots 1,5,6,7 and on dataset1001; we saw the same nfs issue shortly after reboot, with no dump processes going, as snapshots 5,6,7 had not remounted the filesystem	[production]
11:20	<moritzm>	rebooting wtp2* servers for kernel update	[production]
11:14	<moritzm>	rebooting restbase1009 for kernel update	[production]
10:56	<hashar>	restarting Jenkins for plugin upgrades	[production]
10:45	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 59s)	[production]
10:41	<marostegui>	Run redact_sanitarium on s6 databases db1102 - T153743	[production]
10:41	<moritzm>	rebooting wtp1001 for kernel update	[production]
10:37	<moritzm>	rebooting restbase1008 for kernel update	[production]
10:32	<apergos>	rebooting snapshot hosts to clean up hung nfs client processes	[production]
10:30	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 51s)	[production]
10:24	<apergos>	rebooted dataset1001 to unstick nfsd and pick up new kernel, re-enabled puppet	[production]
10:14	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1066 - T168661 (duration: 02m 50s)	[production]
10:11	<moritzm>	rebooting restbase1007 for kernel update	[production]
10:01	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1066 - T168661 (duration: 02m 50s)	[production]
09:57	<marostegui>	Deploy alter table on s1 eqiad hosts - T168661	[production]
09:48	<godog>	move 'instances' graphite hierarchy out of the way, do not delete yet - T143405	[production]
09:27	<marostegui>	Stop MySQL on db1085 for maintenance - T153743	[production]
09:21	<godog>	upload nginx_1.11.10-1+wmf2 to jessie-wikimedia and nginx_1.11.10-1+wmf2~stretch1 to stretch-wikimedia	[production]
09:17	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1085 - T153743 (duration: 02m 50s)	[production]
08:44	<apergos>	puppet disabled and processes accessing dataset1001 exported filesystem shot, on: stat1002,3, snapshot1001,5,6,7, while investigation continues	[production]