production SAL

8851-8900 of 10000 results (31ms)

2018-04-24 §
12:28	<elukey>	cleanup /home/elukey/zookeeper backup files taken before the 3.4.9 migration	[production]
12:10	<elukey>	reimage analytics106[34] to Debian Stretch	[production]
10:50	<elukey>	reimage analytics106[56] to Debian Stretch	[production]
08:14	<elukey>	upload druid_0.10.0-3~jessie1 (collection of druid packages) to jessie-wikimedia - T164008	[production]
06:56	<elukey>	restart zookeeper on conf200[123] for openjdk upgrades	[production]
2018-04-23 §
19:34	<elukey@tin>	Finished deploy [analytics/pivot/deploy@cb9ddee]: Fix 0.10.0 compatibility - T164008 (duration: 00m 17s)	[production]
19:34	<elukey@tin>	Started deploy [analytics/pivot/deploy@cb9ddee]: Fix 0.10.0 compatibility - T164008	[production]
13:55	<elukey>	reimage analytics1067 to Debian Stretch - T192557	[production]
07:35	<elukey>	reboot ms-be2034 - stuck in com2 console with "sd 0:1:0:1: rejecting I/O to offline device", not responsive to ssh	[production]
2018-04-20 §
11:27	<elukey>	reimage analytics1068 to Debian Stretch - T192557	[production]
09:12	<elukey>	restart of mw apis showing ~50% cpu utilization as precaution before the weekend - mw[1224,1225,1228,1230,1231,1233-1235,1276-1283,1286,1312,1313,1315,1316,1341,1343,1344,1347,1348]*	[production]
08:39	<elukey>	restart hhvm on mw[1226,1232].eqiad.wmnet - high load	[production]
2018-04-19 §
09:59	<elukey>	complete migration of zookeeper on conf100[123]	[production]
09:33	<elukey>	upgrade zookeper on conf100[123] from 3.4.5 to 3.4.9 - T182924	[production]
2018-04-18 §
14:00	<elukey>	restart kafka on kafka1001 and kafka2001 (jobqueues,eventbus) for opnejdk-7 upgrades	[production]
08:44	<elukey>	execute cumin 'analytics10[28-69]' 'rm /etc/apt/preferences.d/r_ && apt-get update' to clear jessie backports apt config - T192348	[production]
2018-04-17 §
16:37	<elukey>	incremental rollout of the new zookeeper jmx config to druid1* and conf*	[production]
13:25	<elukey>	completed migration of zookeeper on conf200[123]	[production]
13:00	<elukey>	upgrade zookeeper on conf200[123] to 3.4.9~jessie - T182924	[production]
08:19	<elukey>	restart nrpe-server on kafka2001 (kafka check not defined)	[production]
2018-04-16 §
08:04	<elukey>	restart hhvm on mw[1228,1234,1281-1287,1289,1290,1312-1314,1317,1339,1343,1345,1346,1348] - more than 50% cpu usage, prevention scheme for current high load	[production]
05:59	<elukey>	restart hhvm on mw[1221,1233,1280,1347] - high load	[production]
05:55	<elukey>	repool mw1341 after investigation	[production]
05:48	<elukey>	restart hhvm on mw1225, 1315, 1316, 1340, 1341, 1342, 1347 - high load	[production]
05:36	<elukey>	restart hhvm on mw1226,27,32,88 - high load	[production]
2018-04-15 §
21:42	<elukey>	restart hhvm on mw1286,1317,1339 - high load	[production]
20:52	<elukey>	restart hhvm on mw13[43,45,46,48] - high load	[production]
20:48	<elukey>	restart hhvm on mw13[12-14] - high load	[production]
20:45	<elukey>	restart hhvm on mw[1285,1287,1289-1290] - high load	[production]
20:38	<elukey>	restart hhvm on mw12[22,79,82] - high load	[production]
20:32	<elukey>	restart hhvm on mw12[32-35] - high load	[production]
20:24	<elukey>	restart hhvm on mw1229-31 - high load	[production]
20:17	<elukey>	restart hhvm on mw122[6-8] - high load	[production]
20:05	<elukey>	restart hhvm on mw122[3,4] - high load	[production]
13:42	<elukey>	restart hhvm on mw1227 due to high load (hhvm dump debug in /tmp/hhvm.44071.bt)	[production]
10:53	<elukey>	powercycle mw1272 - not responsive to ssh, mgmt com2 console showing "[OK" and no tty	[production]
2018-04-13 §
13:52	<elukey>	roll restart druid + zookeeper daemons on druid100[123] for openjdk-7 updates	[production]
13:32	<elukey>	restart druid and zookeeper daemons on druid100[456] for opejdk-7 updates	[production]
2018-04-12 §
06:08	<elukey>	force kill of fuse_dfs (handling /mnt/hdfs) on stat1004, apparently causing a huge load	[production]
06:05	<elukey>	force kill of fuse_dfs (handling /mnt/hdfs) on stat1005, apparently causing a huge load	[production]
2018-04-11 §
16:44	<elukey>	restart hadoop hdfs namenodes on analytics100[12] to pick up HDFS Trash settings - T189051	[production]
16:14	<elukey>	reboot notebook1001 for kernel updates	[production]
13:12	<elukey>	restart kafka brokers on kafka1012->23 for openjdk-7 upgrades	[production]
06:49	<elukey>	restart Yarn Resource Manager daemons on analytics100[12] to pick up the new Prometheus configuration file	[production]
2018-04-10 §
13:35	<elukey>	restart kafka on kafka-jumbo1001 for openjdk upgrades	[production]
2018-04-09 §
12:01	<elukey>	upgrading Boost libraries on all mediawiki eqiad API server with a ICU 57-enabled HHVM build and restart HHVM (T189295)	[production]
10:13	<elukey>	completed upgrade of mw eqiad api appservers to ICU 57-enabled HHVM	[production]
08:45	<elukey>	upgrading eqiad api appservers to ICU 57-enabled HHVM build (T189295)	[production]
07:09	<elukey>	upgrade burrow to 1.0 on kafkamon[12]* - T188719	[production]
06:24	<elukey>	upgrade burrow 1.0.0 to stretch/jessie wikimedia	[production]