production SAL

3151-3200 of 10000 results (45ms)

2016-10-20 §
15:59	<mutante>	short downtime of ganglia web ui	[production]
15:59	<mutante>	rebooting uranium	[production]
15:36	<apergos>	rolling reboots for jobrunners in codfw: mw2080-2085, mw2153-mw2162, mw2247-2250	[production]
15:14	<apergos>	rolling reboot of image scalers for codfw, eqiad: mw2086-2089, mw2148-2151, mw1293-1298	[production]
15:10	<ottomata>	restarted statsv on hafnium	[production]
14:55	<moritzm>	bounced ntp on mw2196/mw2197 (XFAC state)	[production]
14:34	<moritzm>	rebooting rutherfordium for kernel update	[production]
14:27	<filippo@puppetmaster1001>	conftool action : set/pooled=no; selector: name=prometheus1001.eqiad.wmnet	[production]
14:26	<filippo@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=prometheus1002.eqiad.wmnet	[production]
14:24	<akosiaris>	bounce ntpd on bast4001	[production]
14:20	<moritzm>	rebooting auth* servers	[production]
14:20	<ottomata>	starting rolling restart of analytics-eqiad kafka brokers to apply kernel update	[production]
14:18	<filippo@puppetmaster1001>	conftool action : set/pooled=no; selector: name=prometheus2001.codfw.wmnet	[production]
14:18	<filippo@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=prometheus2002.codfw.wmnet	[production]
14:17	<apergos>	rolling reboot of remaining app servers in codfw: mw2221-2245, and in eqiad: mw1261-1275	[production]
14:11	<jmm@puppetmaster1001>	conftool action : set/pooled=inactive; selector: mw2098.codfw.wmnet	[production]
14:09	<jynus@mira>	Synchronized wmf-config/db-eqiad.php: mariadb: move db1053 from s1 to s4 (duration: 02m 06s)	[production]
13:38	<moritzm>	restarting mx1001 for kernel update	[production]
13:22	<moritzm>	restarting francium for kernel update	[production]
13:15	<godog>	rolling reboot of prometheus machines for kernel update	[production]
13:14	<moritzm>	restarting ms1001 for kernel update	[production]
13:10	<elukey>	force failover from temporary Hadoop Master node (an1002) to its stanby (an1001) to restore the standard configuration	[production]
13:05	<elukey>	correction: force failover for Hadoop Master node (an1001) to its stanby (an1002) and rebooting an1001 for kernel upgrades	[production]
12:59	<elukey>	force failover for Hadoop Master node (an1002) to its stanby (an1002) and rebooting an1001 for kernel upgrades	[production]
12:59	<moritzm>	ferm on baham (failed to start due to failing DNS resolution in early boot)	[production]
12:52	<moritzm>	restarting mx2001 for kernel update	[production]
12:48	<moritzm>	bounced ntp on mw2116 (XFAC state)	[production]
12:39	<elukey>	restarting an1003 for kernel upgrades (oozie/hive master)	[production]
12:35	<moritzm>	bounced ntp on baham (was stick in INIT phase)	[production]
12:31	<apergos>	more app server rolling restarts for codfw: mw2163-2199	[production]
12:29	<apergos>	more API server rolling restarts for eqiad: mw1221-1235, 1276-1290	[production]
12:27	<apergos>	more APP server rolling restarts for eqiad: mw1209-1216, 128-1220, 1236-38, 1240-1258	[production]
12:12	<moritzm>	restarting bast2001 for kernel update	[production]
12:11	<apergos>	retaction. those are app servers, not starting them yet	[production]
12:10	<apergos>	more api server rolling restarts for eqiad: mw1209-1216, 128-1220, 1236-38, 1240-1258	[production]
12:08	<moritzm>	bounced ntp on mw2206 (XFAC state)	[production]
12:05	<bblack>	correction: rebooting baham / ns1.wikimedia.org for kernel	[production]
12:04	<bblack>	rebooting baham / ns2.wikimedia.org for kernel	[production]
11:53	<elukey>	rebooting an1027 (camus job launcher) for kernel upgrades	[production]
11:48	<moritzm>	bounced ntp on mw2101 and mw2147 (XFAC state)	[production]
11:48	<bblack>	depool cp1047 (cache_maps eqiad)	[production]
11:23	<apergos>	rolling restarts of more api servers in codfw: mw2200 - 2220	[production]
11:17	<elukey>	rebooting all the Analytics Hadoop nodes for kernel upgrades	[production]
11:07	<mobrovac>	change-prop restarting in codfw after kafka kernel upgrade	[production]
10:58	<apergos>	rolling reboots for first batch of app servers in eqiad: mw1170-1188	[production]
10:50	<elukey>	rebooting kafka200[12] for kernel upgrades (Kafka main-codfw non live cluster)	[production]
10:38	<apergos>	rolling restarts on first batch of api servers in eqiad: mw1189-1208	[production]
10:21	<apergos>	while the first batch of codfw api servers trundle along, starting rolling reboots for appservers in codfw starting with mw2090-2098, 2100-2119	[production]
10:20	<moritzm>	removing a few older kernels on analytics1036, was short of disk space in /boot partition	[production]
10:05	<elukey>	rebooting the Analytics Hadoop cluster for kernel upgrades	[production]