production SAL

8701-8750 of 10000 results (30ms)

2018-07-11 §
17:06	<elukey>	restarted kafka on kafka1001 with Xmx 2G and Xms 2F	[production]
16:50	<elukey>	stop topics cleaner script	[production]
16:36	<elukey>	start topic clean procedure on kafka1001 (tmux root session)	[production]
16:19	<elukey>	restart kafka on kafka1003	[production]
15:11	<elukey>	restart again kafka on kafka100[1,2] - failed for OOM	[production]
15:03	<elukey>	restart kafka on kafka1003	[production]
14:57	<elukey>	rolling restart of eventbus on kafka100[1-3]	[production]
14:53	<elukey>	restart kafka on kafka1002	[production]
14:52	<elukey>	restart kafka on kafka1001	[production]
13:14	<elukey>	roll restart of aqs on aqs* to pick up the new Druid config	[production]
07:57	<elukey>	roll restart of aqs on aqs* to rollback the druid config	[production]
2018-07-10 §
10:03	<elukey>	restart analytics100[1,2]'s hadoop resource managers, some I/O socket errors after the ip6 interface change	[production]
09:34	<elukey>	forced umount of /mnt/hdfs on stat1004, several processes hang for it (causing load) and transport not connected	[production]
08:34	<elukey>	rolling restart of AQS to apply the new config	[production]
2018-07-09 §
15:32	<elukey>	enabled snappy compression for varnishkafka eventlogging	[production]
07:18	<elukey>	update filter analytics-in4 on cr1/cr2 eqiad	[production]
06:25	<elukey>	restart hue on thorium to pick up new smtp changes - T196920	[production]
2018-07-08 §
11:27	<elukey>	restart rsyslog on lithium - in:imtcp thread stuck at 99% cpu usage	[production]
2018-07-05 §
15:26	<elukey>	upgrade (without jvm restart) prometheus-jmx-exporter on the analytics node listed in debmonitor still not running the last version	[production]
07:13	<elukey>	stop mariadb on analytics1003 to apply https://gerrit.wikimedia.org/r/443893 and enable auth via unix socket	[production]
2018-07-04 §
09:15	<elukey>	reimage aqs1009 to Debian Stretch	[production]
08:53	<elukey>	update analytics-in4 filter rules on cr1/cr2 eqiad - T198623	[production]
06:15	<elukey>	reimage aqs1008 to Debian Stretch	[production]
2018-07-03 §
11:39	<elukey>	reimage aqs1007 to Debian Stretch	[production]
09:11	<elukey>	reimage aqs1006 to Debian Stretch	[production]
07:38	<elukey>	reimage aqs1005 to debian stretch	[production]
2018-07-02 §
14:29	<elukey>	copy cassandra-tools-wmf 1.0.2-1 from jessie-wikimedia to stretch-wikimedia	[production]
13:34	<elukey>	reimage aqs1004 to Debian Stretch	[production]
09:53	<elukey>	reboot ms-be1039 (bad disk, spike in I/O and load, not reachable via ssh or mgmt console)	[production]
2018-06-29 §
05:40	<elukey>	force umount of dumps labstore nfs mountpoints on stat100[56]/notebook100[34] to reduce load (also too many open files)	[production]
2018-06-28 §
14:46	<elukey>	upgrade piwik 3.2.1 to matomo (new name/package) 3.5.1 - T192298	[production]
13:49	<elukey>	downgrade cassadra and cassandra-tools from 2.2.6-wmf5 to 2.2.6-wmf3 in jessie-wikimedia component/cassandra22 - T197062	[production]
13:01	<elukey>	upload matomo (new Piwik) 3.5.1-1 to jessie-wikimedia	[production]
12:49	<elukey>	stop hadoop daemons on analytics1032 + shutdown to swap BBU -T194234	[production]
08:15	<elukey>	restart-hhvm on mw1227 (some threads stuck in jit-related operations, causing high load)	[production]
07:12	<elukey>	upload piwik 3.2.1 to jessie-wikimedia	[production]
2018-06-27 §
21:50	<elukey>	piwik maintenance on bohrium completed	[production]
13:07	<elukey>	piwik upgraded to 3.2.1 on bohrium + started the db migration procedure (will last 2/3h probably)	[production]
2018-06-26 §
14:43	<elukey>	rm syslog.1.gz puppet.log.1.gz on tegment to fix cronspam	[production]
2018-06-25 §
14:14	<elukey>	merging jmxtrans and kafkatee's submodules to operations/puppet - part 2 (moving them back from environments/production)	[production]
13:53	<elukey>	merging jmxtrans and kafkatee's submodules to operations/puppet - part 1 (moving them under environments/production)	[production]
2018-06-15 §
14:49	<elukey>	restart varnishkafka-eventlogging on cp4028, errors logged	[production]
14:43	<elukey>	restart varnishkafka-eventlogging on cp5012 as attempt to clear out the errors (not needed but logging it anyway)	[production]
2018-06-14 §
11:16	<elukey>	upgrade cassandra on aqs* to 2.2.6-wmf5	[production]
09:15	<elukey>	add debmonitor term to analytics-in4 on cr1/cr2 eqiad	[production]
08:31	<elukey>	restart hadoop hdfs master nodes to pick up the new journal node settings	[production]
08:07	<elukey>	roll restart of hadoop journal nodes to pick up the new configuration (two more journal nodes added)	[production]
2018-06-13 §
15:55	<elukey>	rolling restart of aqs on aqs100[4-9] to pick up the new config changes	[production]
13:28	<elukey@deploy1001>	Finished deploy [analytics/aqs/deploy@160206f]: (no justification provided) (duration: 04m 11s)	[production]
13:24	<elukey@deploy1001>	Started deploy [analytics/aqs/deploy@160206f]: (no justification provided)	[production]