9801-9850 of 10000 results (16ms)
2016-12-18 §
08:57 <elukey> forced restart of cassandra-c on restbase1011 [production]
08:51 <elukey> forced restart of cassandra-b/c on restbase1013 (b not really needed, my error) [production]
08:49 <elukey> forced restart for cassandra-a on restbase1009 (still OOMs) [production]
08:43 <elukey> forced puppet on restbase1009 to bring up cassandra-a (stopped due to OOM issues) [production]
2016-12-17 §
09:38 <elukey> ran apt-get clean and removed some /tmp files on stat1002 to free some space [production]
09:24 <elukey> restarted stuck hhvm on mw1168 (forgot to run hhvm-dump-debug) [production]
2016-12-16 §
15:13 <elukey> prometheus apache and hhvm exporters running on the eqiad MW appservers [production]
14:30 <elukey> disabling puppet on the eqiad appservers to rollout gradually the prometheus apache/hhvm exporters [production]
2016-12-15 §
07:40 <elukey> moved some home files on stat1002 to the data-tank partition to free some space [production]
2016-12-14 §
23:37 <elukey> sent an email to the owners of the biggest home directories on stat1002 [production]
2016-12-13 §
20:00 <elukey> uploaded prometheus-apache-exporter 0.3-1 to jessie-wikimedia main [production]
14:47 <elukey> testing prometheus-apache-exporter on mw2198 [production]
2016-12-07 §
17:25 <elukey> puppet run completed on mw1* hosts (10% batch-size) [production]
17:08 <elukey> Apache config changed on mw2*, tests look fine (apachectl -S does not show the vhost, apachectl -t is ok, apache-fast-test from tin is ok). Proceeding with eqiad [production]
16:54 <elukey> force puppet run on mw2* hosts (10% batch-size) [production]
16:47 <elukey> running puppet on some mw codfw appservers to check the new config [production]
16:41 <elukey> disabled puppet on mw1* hosts as prep step [production]
16:39 <elukey> removing bits.w.o VHost from mediawiki apache config (https://gerrit.wikimedia.org/r/#/c/305536) [production]
2016-12-06 §
08:47 <elukey> restarting hhvm on mw1285 (hhvm debug in /tmp/hhvm.100918.bt) [production]
2016-12-05 §
17:19 <elukey> restarting hhvm on mw1268 (hhvm-debug in /tmp/hhvm.16827.bt.) [production]
17:16 <elukey> restarting hhvm on mw1285 (hhvm-debug in /tmp/hhvm.140129.bt.) [production]
16:50 <elukey> added nagios process check alarms for varnishakfka-statsv and varnishkafka-eventlogging on cache::text hosts [production]
14:08 <elukey> depooling mw1239 for maintenance (T148421) [production]
2016-12-02 §
08:37 <elukey> restarting hhvm (/usr/local/bin/restart-hhvm) on G@cluster:api_appserver and G@site:eqiad (batch 10%) [production]
2016-12-01 §
14:46 <elukey> restarting kafka on kafka100[123] (EventBus) for openjdk upgrades [production]
14:19 <elukey> restarting kafka also on kafka2003 [production]
14:17 <elukey> restarting kafka on kafka200[12] for openjdk upgrades [production]
10:25 <elukey> removed --debug flag to the puppet compiler output [production]
09:54 <elukey> added --debug to the puppet compiler options in Jenkins [production]
07:57 <elukey@tin> Finished deploy [analytics/pivot/deploy@0513a6e]: (no message) (duration: 00m 02s) [production]
07:57 <elukey@tin> Starting deploy [analytics/pivot/deploy@0513a6e]: (no message) [production]
2016-11-29 §
12:05 <elukey> complete rolling restart of apache in eqiad [production]
11:48 <elukey> re-enable puppet on mw1* hosts and apply Apache config change (https://gerrit.wikimedia.org/r/#/c/314519) [production]
11:23 <elukey> disabled puppet on mw1* hosts as pre-step for https://gerrit.wikimedia.org/r/#/c/314519 [production]
2016-11-27 §
09:35 <elukey> removed all the files not used in /tmp on stat1002 after a follow up with the owner [production]
2016-11-26 §
15:35 <elukey> deleted tmp files on stat1002's /tmp partition because of disk space consumption. Will follow up with the owner. [production]
2016-11-25 §
08:52 <elukey> restarting Yarn and HDFS masters on analytics100[12] (Hadoop cluster) to complete the openjdk update [production]
2016-11-24 §
12:36 <elukey> launched preferred-replica-election to re-add kafka1022 among the Topic partition leader brokers of the Analytics Kafka cluster (all metrics looks good) [production]
2016-11-21 §
17:29 <elukey> unmasked kafka* on kafka1022 after disk swap [production]
11:56 <elukey> restarted jobchron/runner on mw208[0-5] since systemd was reporting degradation (broken pipes in the journald logs) [production]
08:50 <elukey> rolling restart of hadoop-related java daemons on analytics* hosts due to openjdk update [production]
2016-11-18 §
08:33 <elukey> kafka1022 up and running with kafka* daemon masked and broken disk removed from fstab (we mount partitions in there using UUIDs) [production]
2016-11-17 §
10:22 <elukey> cleanup on analytics1027 - Removed mysql-server-5.5 (not used) and ran apt autoremove (old kernels) [production]
09:19 <elukey> rebooting mc1019->mc1036 (memcached/redis servers, not taking any traffic) for kernel upgrades [production]
2016-11-11 §
10:51 <elukey> restored mw1284 to its normal settings [production]
10:05 <elukey> increasing apache log level on mw1284 (depooling, applying config manually, re-pooling with lower weight) for a 503 investigation [production]
2016-11-10 §
15:01 <elukey> restored mw1284 to its settings [production]
14:47 <elukey> de-pooling mw1284 to raise mod_proxy_fcgi log level manually (temporary for an ongoing investigation) [production]
09:43 <elukey> restarting druid daemons on druid100[123] for openjdk updates [production]
2016-11-09 §
14:13 <elukey> rebooting kafka1014.eqiad.wmnet for kernel and openjdk upgrades [production]