8251-8300 of 10000 results (24ms)
2019-07-23 §
19:09 <elukey> depool mw1261 for investigation [production]
2019-07-22 §
18:59 <elukey> repool scb1001 after pdu maintenance [production]
18:00 <elukey> arm keyholder on netmon1002 after power loss [production]
17:35 <elukey> depool scb1001 for PDU work T227140 [production]
09:47 <elukey> failover + restart of Hadoop HDFS namenode on an-master1001 to apply GC settings - T228620 [production]
09:32 <elukey> restart hadoop hdfs namenode on an-master1002 to apply new GC settings - T228620 [production]
07:54 <elukey> sudo -i depool on elastic1046 - broken disk (srv partition not available) - T228606 [production]
07:40 <elukey> systemctl reset-failed restbase on restbase1007->15 (decommed nodes) [production]
06:23 <elukey> restart hadoop-hdfs-namenode on an-master1002 to verify if out-of-the-ordinary GC activity [production]
2019-07-19 §
07:03 <elukey> restart php-fpm on mw1330 - op-cache hit ratio low [production]
07:01 <elukey> depool wdqs2004 from all services (waiting for maintenance) [production]
06:15 <elukey> clear opcache on mwdebug* [production]
2019-07-17 §
16:40 <elukey> execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm (not used anymore) [production]
06:59 <elukey> apply mcrouter async replication to mw2224 - T225642 [production]
06:25 <elukey> reboot analytics1072 as attempt to clear the megacli's config (and add a new disk) [production]
06:20 <elukey> sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache [production]
2019-07-16 §
15:37 <elukey> reboot analytics1072 as attempt to force the raid controller to set a drive failed - T226467 [production]
15:12 <elukey> start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108 [production]
14:53 <elukey> stop mariadb on db1107 to allow maintenance [production]
14:53 <elukey> stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance [production]
09:24 <elukey> apply mcrouter async replication settings to mw1276 - T225642 [production]
09:23 <elukey> pool mw1261 back with mcrouter async replication settings - T225642 [production]
07:45 <elukey> depool mw1261 to test mcrouter changes [production]
2019-07-15 §
13:55 <elukey> enable profile::base::firewall on notebook100[3,4] [production]
2019-07-12 §
05:45 <elukey> sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to clear opcache [production]
2019-07-09 §
13:26 <elukey> enable base::firewall on stat1007 [production]
10:39 <elukey> update wikimedia-buster thirparty/amd-rocm component with upstream packages - T224723 [production]
09:13 <elukey> enable per-server metrics on all prometheus-mcrouter-exporter(s) via puppet - T225059 [production]
08:49 <elukey> upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-eqiad (cumin alias) via debdeploy - T225059 [production]
08:36 <elukey> upgrade prometheus-mcrouter-exporter to 0.0.0+git20190709-1 on mw-codfw (cumin alias) via debdeploy - T225059 [production]
07:26 <elukey> upload prometheus-mcrouter-exporter 0.0.0+git20190709-1 to stretch-wikimedia - T225059 [production]
2019-07-08 §
13:52 <elukey> import AMD ROCm's Debian repo key (9386B48A1A693C5C) manually on install1002 - T224723 [production]
09:51 <elukey@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
09:51 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
07:00 <elukey> add base::firewall to stat1004 - T170826 [production]
2019-07-05 §
13:44 <elukey> roll restart of aqs on aqs100* to pick up new druid settings [production]
2019-07-04 §
06:42 <elukey> update puppet compiler's facts [production]
2019-07-03 §
06:00 <elukey> move the zookeeper puppet submodule into operations/puppet - T226466 [production]
2019-07-02 §
10:05 <elukey> powercycle analytics1056 (soft lockups logged in the serial console, no ssh, no net connectivity) [production]
2019-07-01 §
18:35 <elukey@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
17:27 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
15:37 <elukey@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
15:36 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
15:27 <elukey@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
15:27 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
10:04 <elukey> remove burrow-analytics.service from kafkamon1001 (the analytics cluster has been decommed) [production]
09:55 <elukey> reboot kafkamon1001 with 4g of dedicated ram (was 8g) - T224988 [production]
09:54 <elukey> reboot kafkamon2001 with 4g of dedicated ram (was 8g) - T224988 [production]
08:39 <elukey> restart hadoop-yarn-nodemanager on all hadoop workers to pick up new jvm settings - T225296 [production]
2019-06-28 §
18:12 <elukey> systemctl reset-failed kafka* units on kafka2001 (in decom phase) [production]