production SAL

1901-1950 of 10000 results (40ms)

2017-07-05 §
12:11	<apergos>	puppet is currently disabled again on snapshots 1,5,6,7 and on dataset1001; we saw the same nfs issue shortly after reboot, with no dump processes going, as snapshots 5,6,7 had not remounted the filesystem	[production]
11:20	<moritzm>	rebooting wtp2* servers for kernel update	[production]
11:14	<moritzm>	rebooting restbase1009 for kernel update	[production]
10:56	<hashar>	restarting Jenkins for plugin upgrades	[production]
10:45	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 59s)	[production]
10:41	<marostegui>	Run redact_sanitarium on s6 databases db1102 - T153743	[production]
10:41	<moritzm>	rebooting wtp1001 for kernel update	[production]
10:37	<moritzm>	rebooting restbase1008 for kernel update	[production]
10:32	<apergos>	rebooting snapshot hosts to clean up hung nfs client processes	[production]
10:30	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 51s)	[production]
10:24	<apergos>	rebooted dataset1001 to unstick nfsd and pick up new kernel, re-enabled puppet	[production]
10:14	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1066 - T168661 (duration: 02m 50s)	[production]
10:11	<moritzm>	rebooting restbase1007 for kernel update	[production]
10:01	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1066 - T168661 (duration: 02m 50s)	[production]
09:57	<marostegui>	Deploy alter table on s1 eqiad hosts - T168661	[production]
09:48	<godog>	move 'instances' graphite hierarchy out of the way, do not delete yet - T143405	[production]
09:27	<marostegui>	Stop MySQL on db1085 for maintenance - T153743	[production]
09:21	<godog>	upload nginx_1.11.10-1+wmf2 to jessie-wikimedia and nginx_1.11.10-1+wmf2~stretch1 to stretch-wikimedia	[production]
09:17	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1085 - T153743 (duration: 02m 50s)	[production]
08:44	<apergos>	puppet disabled and processes accessing dataset1001 exported filesystem shot, on: stat1002,3, snapshot1001,5,6,7, while investigation continues	[production]
07:27	<moritzm>	rebooting restbase-dev* for kernel update	[production]
07:13	<moritzm>	rebooting notebook* hosts	[production]
05:18	<marostegui>	Deploy alter table on s3 master - db1075 - T168661	[production]
05:13	<marostegui>	Deploy alter table on s7 master - db1062 - T168661	[production]
05:08	<marostegui>	Force a relearn on db1046's BBU - T166141	[production]
02:27	<l10nupdate@tin>	scap sync-l10n completed (1.30.0-wmf.7) (duration: 10m 23s)	[production]
2017-07-04 §
21:40	<volans>	ACK'ed puppet not running on stat100[2-3],snapshot100[1,5-7] due to NFS overloaded on dataset1001 - T169680	[production]
16:54	<jynus>	dropping ukwikimedia from several labsdbhosts	[production]
16:10	<moritzm>	rebooting radium for kernel update	[production]
15:09	<mobrovac@tin>	Finished deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105) (duration: 02m 52s)	[production]
15:06	<mobrovac@tin>	Started deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105)	[production]
15:02	<godog>	set operations/debs/nginx as hidden and update description	[production]
14:57	<ema>	pybal 1.13.7 uploaded to apt.w.o, testing it on pybal-test2001 T82747 T154759	[production]
14:31	<godog>	copy nginx from jessie-wikimedia to stretch-wikimedia	[production]
14:15	<paravoid>	reset db2038's iLO	[production]
13:06	<filippo@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet	[production]
11:47	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Remove comments from db1039 status - T166208 (duration: 02m 50s)	[production]
11:25	<joal@tin>	Finished deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch (duration: 03m 38s)	[production]
11:21	<joal@tin>	Started deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch	[production]
11:15	<elukey>	powercycle elastic1018, host unreachable	[production]
11:02	<joal@tin>	Finished deploy [analytics/refinery@12c5f57]: Regular weekly deploy (duration: 04m 47s)	[production]
11:00	<moritzm>	rebooting kubernetes workers for kernel update	[production]
10:58	<godog>	copy wikimedia-lvs-realserver from jessie-wikimedia to stretch-wikimedia	[production]
10:57	<joal@tin>	Started deploy [analytics/refinery@12c5f57]: Regular weekly deploy	[production]
10:53	<gehel>	killing stuck wmf-reimage on puppetmaster1001 for maps-test2001	[production]
10:40	<marostegui>	Stop replication on db1102 (sanitarium3) on s2 shard for maintenance - T153743	[production]
10:33	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1060 - T153743 (duration: 02m 49s)	[production]
10:23	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1035 - T168661 (duration: 02m 49s)	[production]
10:14	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1035 - T168661 (duration: 02m 50s)	[production]
09:58	<marostegui>	Move labsdb1009 main general replication thread to a named replication thread called db1095 - T153743	[production]