7951-8000 of 10000 results (32ms)
2020-02-26 §
10:03 <elukey> upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias parsoid/deployment-servers/mw-maintenance [production]
09:54 <elukey> upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-eqiad [production]
09:37 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers [production]
09:34 <elukey> roll restart the Hadoop Analytcs workers for openjdk upgrades [production]
09:32 <elukey> upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-codfw [production]
08:51 <elukey> upload prometheus-mcrouter-exporter 0.1.0+git20200225-1 to stretch-wikimedia [production]
08:38 <elukey> upgrade prometheus-mcrouter-exporter on mwdebug1001 to test the new version [production]
2020-02-24 §
09:08 <elukey> update puppet compiler's facts [production]
2020-02-23 §
16:52 <elukey> powercycle mw1372 - no mgmt console, no ssh [production]
2020-02-21 §
15:51 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
15:06 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
11:48 <elukey> restart varnishkafka-webrequest on cp3052 (stuck in timeouts to kafka, analytics alarms raised) [production]
11:47 <elukey> restart varnishkafka-webrequest on cp3056/cp3058/cp3054/cp3064 (stuck in timeouts to kafka, analytics alarms raised) [production]
11:39 <elukey> restart varnishkafka on cp3057 (stuck in timeouts to kafka, analytics alarms raised) [production]
11:14 <elukey> reboot stat1005 - GPU blocked at 100% after issue with tensorflow [production]
2020-02-19 §
16:05 <elukey> Update analytics-in4 filter term eventgate for T245203 on cr1/cr2 eqiad [production]
2020-02-18 §
07:34 <elukey> powercycle analytics1065 (crashed hours ago, no mgmt console available, no ssh) [production]
2020-02-17 §
18:25 <elukey> restart kafka on kafka-jumbo1001 to pick up new openjdk updates [production]
2020-02-12 §
09:13 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
08:55 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
2020-02-10 §
07:09 <elukey> restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5) [production]
2020-02-07 §
06:31 <elukey> force a puppet run on all ores[12] nodes [production]
2020-02-06 §
13:34 <elukey> repool mw1347 with mcrouter running with 10 proxy threads (was: 5) [production]
13:30 <elukey> depool mw1347 to test some mcrouter settings [production]
06:46 <elukey> run puppet on all ores[12]* nodes [production]
2020-02-05 §
18:21 <elukey> restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached) [production]
16:07 <elukey> update puppet compiler's facts [production]
06:32 <elukey> force a puppet run on ores* hosts [production]
2020-02-04 §
14:03 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
14:00 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
06:48 <elukey> force a puppet run on all ores[12] nodes [production]
2020-02-01 §
16:30 <elukey> powerup analytics1073 (attempt to see if it was only a kernel-related crash) - T244064 [production]
2020-01-26 §
17:28 <elukey> restart varnishkafka-webrequest on cp3064 [production]
17:25 <elukey> restart varnishkafka-webrequest on cp3056 [production]
15:26 <elukey> repool deployed [production]
15:24 <elukey> repool esams [production]
2020-01-20 §
12:25 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
12:18 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
2020-01-19 §
11:20 <elukey> restart-php-fpm on mw2181 to rule out temporary php-related issues in codfw [production]
2020-01-17 §
11:13 <elukey> restart nginx on analitycs tool hosts to pick up openssl updates [production]
2020-01-16 §
15:16 <elukey@deploy1001> Finished deploy [analytics/superset/deploy@16a1644]: Upgrade to superset 0.35.2 (duration: 00m 40s) [production]
15:15 <elukey@deploy1001> Started deploy [analytics/superset/deploy@16a1644]: Upgrade to superset 0.35.2 [production]
11:22 <elukey> import packages in stretch-wikimedia's thirdparty/bigtop14 component [production]
2020-01-15 §
16:27 <elukey> import key 0xDBBF9D42B7B4BD70 (Apache BigTop) manually on install1002's gpg [production]
11:36 <elukey> restart all varnishkafka daemons on cp4031 [production]
09:19 <elukey> roll-restart druid brokers on druid100[4-6] - locked up after segments deletion [production]
08:44 <elukey@cumin1001> END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [production]
08:40 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]
08:40 <elukey@cumin1001> END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) [production]
08:40 <elukey@cumin1001> START - Cookbook sre.aqs.roll-restart [production]