production SAL

9851-9900 of 10000 results (21ms)

2016-11-09 §
13:58	<elukey>	stopping kafka* daemons on kafka1014 to upgrade its fstab with UUID (T147879)	[production]
13:46	<elukey>	rebooting kafka1012 for kernel and openjdk updates	[production]
13:35	<elukey>	stopping kafka* daemons on kafka1012 to upgrade its fstab with UUID (T147879)	[production]
12:57	<elukey>	rebooting kafka1022 for kernel + openjdk updates	[production]
10:52	<elukey>	restarting kafka* on kafka1013 for openjkd upgrades	[production]
10:33	<elukey>	rebooting kafka1020 for kernel and openjdk upgrades	[production]
09:35	<elukey>	rebooting kafka1018 for kernel + openjdk upgrade	[production]
2016-11-08 §
08:04	<elukey>	rebooting stat1001 for kernel upgrades (will cause a brief unavail for analytics websites)	[production]
2016-11-07 §
15:44	<elukey>	started kafka-mirror-main-eqiad_to_analytics.service on kafka1012	[production]
15:26	<elukey>	rebooting kafka1013 for kernel upgrades	[production]
2016-11-06 §
10:13	<elukey>	removing logstash.log.1 from logstash100[123] to free some space	[production]
2016-11-02 §
08:32	<elukey>	restarted cassandra-metrics-collector on aqs100[456] for jvm upgrades	[production]
2016-10-31 §
19:17	<elukey>	restarted varnishkafka-webrequest on cp2018 and cp3045 (CRITICALs in icinga, librdkafka errors logged for kafka1018.eqiad.wmnet)	[production]
11:00	<elukey>	restarting cassandra on aqs100[456] for OpenJDK upgrades	[production]
07:43	<elukey>	powercycled cp2010 (not reachable via ssh, com2 console showed a frozen screen)	[production]
2016-10-26 §
08:43	<elukey>	increasing the AQS cassandra system_auth keyspace replication from 1 to 6 (and running nodetool-{a,b} repair system_auth on all nodes)	[production]
08:29	<elukey>	downgraded memcached on mc2009 to the Debian Jessie version (was part of a performance experiment)	[production]
2016-10-25 §
14:14	<elukey>	removed logstash filter for Apache (https://logstash.wikimedia.org/app/kibana#/dashboard/apache2log) - T144005	[production]
12:24	<elukey>	rebooting druid100[123] for kernel upgrades	[production]
10:11	<elukey>	reimaging mc103[1-6] to Jessie	[production]
2016-10-24 §
13:20	<elukey>	reimaging mc120[89] and mc1030	[production]
10:47	<elukey>	reimaged mc102[56], currently doing mc1027	[production]
08:53	<elukey>	reimaging mc1024	[production]
08:20	<elukey>	reimaging mc1023.eqiad.wmnet	[production]
07:46	<elukey>	reimaging mc1022.eqiad.wmnet (T137345)	[production]
2016-10-21 §
16:05	<elukey>	reimaging mc1021 with wmf-auto-reimage (T137345)	[production]
15:28	<elukey>	reimaging mc1019 with wmf-auto-reimage (T137345)	[production]
14:50	<elukey>	reimaging mc1020 with wmf-auto-reimage (T137345)	[production]
09:32	<elukey>	rebooting kafka100[12] for kernel upgrades (EventBus hosts)	[production]
07:20	<elukey>	rebooting stat100[234] for kernel upgrades	[production]
06:26	<elukey>	restarting stat1001 for kernel upgrades (will cause a brief outage for some analytics websites like analytics.w.o and pivot.w.o)	[production]
2016-10-20 §
13:10	<elukey>	force failover from temporary Hadoop Master node (an1002) to its stanby (an1001) to restore the standard configuration	[production]
13:05	<elukey>	correction: force failover for Hadoop Master node (an1001) to its stanby (an1002) and rebooting an1001 for kernel upgrades	[production]
12:59	<elukey>	force failover for Hadoop Master node (an1002) to its stanby (an1002) and rebooting an1001 for kernel upgrades	[production]
12:39	<elukey>	restarting an1003 for kernel upgrades (oozie/hive master)	[production]
11:53	<elukey>	rebooting an1027 (camus job launcher) for kernel upgrades	[production]
11:17	<elukey>	rebooting all the Analytics Hadoop nodes for kernel upgrades	[production]
10:50	<elukey>	rebooting kafka200[12] for kernel upgrades (Kafka main-codfw non live cluster)	[production]
10:05	<elukey>	rebooting the Analytics Hadoop cluster for kernel upgrades	[production]
08:57	<elukey>	rebooting eventlog2001 for kernel upgrades (EL spare host)	[production]
08:54	<elukey>	rebooting eventlog1001 for kernel upgrades (Eventlogging host)	[production]
08:32	<elukey>	rebooting aqs100[456] for kernel upgrades (one at the time, de-pool/reboot/pool)	[production]
08:31	<elukey>	rebooting aqs100[123] for kernel upgrades (one at the time, de-pool/reboot/pool)	[production]
2016-10-19 §
17:15	<elukey>	depooled mw1239.eqiad.wmnet to allow hw investigation (T148421) (was done today but didn't logged properly)	[production]
2016-10-18 §
12:52	<elukey>	mw1169 back in service after reimage (MW Jobrunner)	[production]
11:55	<elukey>	removed /etc/mysql/conf.d/research-client.cnf from stat1002 (root:root perms, not supposed to be there but only on stat1003)	[production]
11:37	<elukey>	reimaging mw1169 to Debian Jessie (MW Jobrunner)	[production]
10:40	<elukey>	mw1168.eqiad.wmnet back in service after reimage (MW Jobrunner)	[production]
09:28	<elukey>	reimaging mw1168 to Debian Jessie (MW Jobrunner)	[production]
09:25	<elukey>	varnishkafka restarting in upload/misc/maps with new settings (https://gerrit.wikimedia.org/r/316306)	[production]