9951-10000 of 10000 results (23ms)
2016-10-04 §
08:43 <elukey> reimaging mw119[89] to jessie [production]
07:09 <elukey> rebooting eventlog1001 for kernel upgrades [production]
07:04 <elukey> executed salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' (also in codfw) to reduce cronspam [production]
2016-10-03 §
09:48 <elukey> lowered down builds log retention from 90 to 60 days for the puppet compiler (https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/) [production]
08:46 <elukey> rebooted compiler02.puppet3-diffs.eqiad.wmflabs (not reachable by Jenkins, pingable from bastions but no ssh available) [production]
2016-09-29 §
16:34 <elukey> executed 'sudo salt -C 'G@cluster:imagescaler and G@site:eqiad' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' to reduce cronspam [production]
16:33 <elukey> executed 'sudo salt -C 'G@cluster:imagescaler and G@site:codfw' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' to reduce cronspam [production]
2016-09-25 §
15:24 <elukey> executed https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_.2Fmnt.2Fhdfs on stat1002 (fusermount didn't succeed to umount though) [production]
2016-09-23 §
09:06 <elukey> reboot eventlog2001.codfw.wmnet for kernel upgrades [production]
08:52 <elukey> upgrading varnishkafka to 1.0.12-1 in cache:misc [production]
08:30 <elukey> upgrading varnishkafka to 1.0.12-1 in cache:maps [production]
07:33 <elukey> executed 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' for all the api and appservers to remove/prevent cronspam (root:adm files also related to new reimaged hosts, Rsyslog needs to be configured before hhvm) - T132324 [production]
2016-09-22 §
16:40 <elukey> forced logrotation for /etc/logrotate.d/upstart on labvirt1014 to investigate cronspam [production]
12:25 <elukey> installing varnishkafka 1.0.12 on cache:upload ulsfo and eqiad [production]
09:02 <elukey> installing varnishkafka 1.0.12 on cache:upload codfw [production]
08:43 <elukey> installing varnishkafka 1.0.12 on cache:upload esams [production]
08:40 <elukey> installed varnishkafka 1.0.12 on cp1099 [production]
08:35 <elukey> restarted varnishkafka on cp1099 (log abandoned ) [production]
08:01 <elukey> rolling restart of the whole Analytics Hadoop cluster for kernel upgrades (analytics* hosts) [production]
07:58 <elukey> uploaded varnishkafka 1.0.12-1 to reprepro [production]
07:52 <elukey> rebooted stat100[23] for kernel upgrades [production]
07:33 <elukey> rebooting stat1004 for kernel upgrades [production]
06:45 <elukey> Puppet disabled on analytics1027 to stop periodic Java daemons (prep step for Hadoop cluster reboots) [production]
2016-09-21 §
17:40 <elukey> installed varnishkafka 1.0.12-1 on cp3034.esams (T138747) [production]
11:03 <elukey> adding mw1197 back to serving live traffic after the reimage [production]
10:51 <elukey> restarted varnishkafka on cp1048 (VSLQ_Dispatch: Varnish Log abandoned or overrun.) [production]
10:45 <elukey> adding mw1196 back to serving live traffic after the reimage [production]
08:30 <elukey> reimagining mw1196-7 to jessie [production]
07:19 <elukey> Moved some hhvm logs (/var/log/hhvm) from root:adm to www-data:www-data on mw127[678] to remove cronspam (T132324) [production]
06:21 <elukey> removing aqs100[123] from live traffic - aqs.svc.eqiad.wmnet - T144497 [production]
2016-09-20 §
17:01 <elukey> adding aqs1006 to live traffic - aqs.svc.eqiad.wmnet - T144497 [production]
16:58 <elukey> adding aqs1005 to live traffic - aqs.svc.eqiad.wmnet - T144497 [production]
16:32 <elukey> restarting cassandra on aqs100[56] (started the work earlier on today, stopped due to T146130) [production]
07:36 <elukey> restart cassandra on aqs100[456] for T130861 - only aqs1004 is taking live traffic [production]
2016-09-19 §
14:21 <elukey> adding aqs1004 to live traffic - aqs.svc.eqiad.wmnet - T144497 [production]
12:51 <elukey> adding mw1191 back to serving traffic after reimage [production]
07:50 <elukey> reimaging mw1191.eqiad.wmnet to jessie [production]
2016-09-16 §
13:56 <elukey> mw1189 back serving traffic after reimage [production]
12:37 <elukey> mw1190 back serving traffic after the reimage [production]
09:34 <elukey> reimage mw1189-90 to Jessie (trying Riccardo's script!) [production]
07:36 <elukey> forced logrotation with debug of /etc/logrotate.d/graphite-web on graphite1001 to find cronspam source [production]
2016-09-15 §
14:57 <elukey> deployed new-aqs-cluster branch (--rev new-aqs-cluster) to aqs100[456] (new AQS cluster not serving live traffic) [production]
2016-09-14 §
16:05 <elukey> restarting cassandra on aqs100[23] T130861 [production]
15:56 <elukey> restarting cassandra on aqs1001 T130861 [production]
2016-09-09 §
13:25 <elukey> analytics1032 back in service after disk swap [production]
12:45 <elukey> running authdns-update on ns0.w.o to pick up the new domain pivot.wikimedia.org (T138262) [production]
12:27 <elukey> reimaging mw213[789] and mw2075 to Jessie [production]
09:03 <elukey> reimage mw2128->mw2131 to Jessie [production]
07:17 <elukey> puppet disabled on analytics1032, Hadoop services stopped - T145170 [production]
2016-09-08 §
06:44 <elukey> reimaging mw2208->mw2211 to jessie [production]