production SAL

9951-10000 of 10000 results (24ms)

2016-10-04 §
08:43	<elukey>	reimaging mw119[89] to jessie	[production]
07:09	<elukey>	rebooting eventlog1001 for kernel upgrades	[production]
07:04	<elukey>	executed salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' (also in codfw) to reduce cronspam	[production]
2016-10-03 §
09:48	<elukey>	lowered down builds log retention from 90 to 60 days for the puppet compiler (https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/)	[production]
08:46	<elukey>	rebooted compiler02.puppet3-diffs.eqiad.wmflabs (not reachable by Jenkins, pingable from bastions but no ssh available)	[production]
2016-09-29 §
16:34	<elukey>	executed 'sudo salt -C 'G@cluster:imagescaler and G@site:eqiad' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' to reduce cronspam	[production]
16:33	<elukey>	executed 'sudo salt -C 'G@cluster:imagescaler and G@site:codfw' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' to reduce cronspam	[production]
2016-09-25 §
15:24	<elukey>	executed https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_.2Fmnt.2Fhdfs on stat1002 (fusermount didn't succeed to umount though)	[production]
2016-09-23 §
09:06	<elukey>	reboot eventlog2001.codfw.wmnet for kernel upgrades	[production]
08:52	<elukey>	upgrading varnishkafka to 1.0.12-1 in cache:misc	[production]
08:30	<elukey>	upgrading varnishkafka to 1.0.12-1 in cache:maps	[production]
07:33	<elukey>	executed 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' for all the api and appservers to remove/prevent cronspam (root:adm files also related to new reimaged hosts, Rsyslog needs to be configured before hhvm) - T132324	[production]
2016-09-22 §
16:40	<elukey>	forced logrotation for /etc/logrotate.d/upstart on labvirt1014 to investigate cronspam	[production]
12:25	<elukey>	installing varnishkafka 1.0.12 on cache:upload ulsfo and eqiad	[production]
09:02	<elukey>	installing varnishkafka 1.0.12 on cache:upload codfw	[production]
08:43	<elukey>	installing varnishkafka 1.0.12 on cache:upload esams	[production]
08:40	<elukey>	installed varnishkafka 1.0.12 on cp1099	[production]
08:35	<elukey>	restarted varnishkafka on cp1099 (log abandoned )	[production]
08:01	<elukey>	rolling restart of the whole Analytics Hadoop cluster for kernel upgrades (analytics* hosts)	[production]
07:58	<elukey>	uploaded varnishkafka 1.0.12-1 to reprepro	[production]
07:52	<elukey>	rebooted stat100[23] for kernel upgrades	[production]
07:33	<elukey>	rebooting stat1004 for kernel upgrades	[production]
06:45	<elukey>	Puppet disabled on analytics1027 to stop periodic Java daemons (prep step for Hadoop cluster reboots)	[production]
2016-09-21 §
17:40	<elukey>	installed varnishkafka 1.0.12-1 on cp3034.esams (T138747)	[production]
11:03	<elukey>	adding mw1197 back to serving live traffic after the reimage	[production]
10:51	<elukey>	restarted varnishkafka on cp1048 (VSLQ_Dispatch: Varnish Log abandoned or overrun.)	[production]
10:45	<elukey>	adding mw1196 back to serving live traffic after the reimage	[production]
08:30	<elukey>	reimagining mw1196-7 to jessie	[production]
07:19	<elukey>	Moved some hhvm logs (/var/log/hhvm) from root:adm to www-data:www-data on mw127[678] to remove cronspam (T132324)	[production]
06:21	<elukey>	removing aqs100[123] from live traffic - aqs.svc.eqiad.wmnet - T144497	[production]
2016-09-20 §
17:01	<elukey>	adding aqs1006 to live traffic - aqs.svc.eqiad.wmnet - T144497	[production]
16:58	<elukey>	adding aqs1005 to live traffic - aqs.svc.eqiad.wmnet - T144497	[production]
16:32	<elukey>	restarting cassandra on aqs100[56] (started the work earlier on today, stopped due to T146130)	[production]
07:36	<elukey>	restart cassandra on aqs100[456] for T130861 - only aqs1004 is taking live traffic	[production]
2016-09-19 §
14:21	<elukey>	adding aqs1004 to live traffic - aqs.svc.eqiad.wmnet - T144497	[production]
12:51	<elukey>	adding mw1191 back to serving traffic after reimage	[production]
07:50	<elukey>	reimaging mw1191.eqiad.wmnet to jessie	[production]
2016-09-16 §
13:56	<elukey>	mw1189 back serving traffic after reimage	[production]
12:37	<elukey>	mw1190 back serving traffic after the reimage	[production]
09:34	<elukey>	reimage mw1189-90 to Jessie (trying Riccardo's script!)	[production]
07:36	<elukey>	forced logrotation with debug of /etc/logrotate.d/graphite-web on graphite1001 to find cronspam source	[production]
2016-09-15 §
14:57	<elukey>	deployed new-aqs-cluster branch (--rev new-aqs-cluster) to aqs100[456] (new AQS cluster not serving live traffic)	[production]
2016-09-14 §
16:05	<elukey>	restarting cassandra on aqs100[23] T130861	[production]
15:56	<elukey>	restarting cassandra on aqs1001 T130861	[production]
2016-09-09 §
13:25	<elukey>	analytics1032 back in service after disk swap	[production]
12:45	<elukey>	running authdns-update on ns0.w.o to pick up the new domain pivot.wikimedia.org (T138262)	[production]
12:27	<elukey>	reimaging mw213[789] and mw2075 to Jessie	[production]
09:03	<elukey>	reimage mw2128->mw2131 to Jessie	[production]
07:17	<elukey>	puppet disabled on analytics1032, Hadoop services stopped - T145170	[production]
2016-09-08 §
06:44	<elukey>	reimaging mw2208->mw2211 to jessie	[production]