8901-8950 of 10000 results (32ms)
2018-04-06 §
08:07 <elukey> upgrade prometheus-burrow-exporter on kafkamon1001/2001 - T188719 [production]
08:07 <elukey> upload prometheus-burrow-exporter 0.0.5 to jessie/stretch-wikimedia - T188719 [production]
2018-04-04 §
15:06 <elukey> delete /srv/deployment/prometheus from restbase* as clean up step for T181728 [production]
14:20 <elukey> apply net.ipv4.tcp_tw_reuse=1 to restbase* via https://gerrit.wikimedia.org/r/#/c/421901 - T190213 [production]
12:02 <elukey> removing /srv/deployment/prometheus from restbase2001/1007 - T181728 [production]
09:16 <elukey> executed systemctl reset-failed kafka-mirror-main-eqiad_to_jumbo-eqiad.service on kafka1020 [production]
2018-04-03 §
17:40 <elukey> manually set net.ipv4.tcp_tw_reuse=1 on restbase1007 as test for T190213 [production]
17:08 <elukey> manually set net.ipv4.tcp_tw_reuse=1 on restbase2001 as test for T190213 [production]
15:39 <elukey> roll restart of zookeeper on conf100[123] to pick up prometheus monitoring [production]
13:18 <elukey> roll restart of zookeeper on conf200[123] to pick up prometheus monitoring settings [production]
08:01 <elukey> restart of druid-(overlord|middlemanager) on druid1004[456] as precautionary measure after zk restart [production]
07:50 <elukey> roll restart zookeeper on druid100[456] to enable prometheus monitoring [production]
06:43 <elukey> execute systemctl reset-failed kafka-mirror-main-eqiad_to_jumbo-eqiad.service on kafka102[23] [production]
2018-03-30 §
10:17 <elukey> roll restart of zookeeper daemons on druid100[123] (Druid analytics cluster) to pick up the new prometheus jmx agent [production]
09:31 <elukey> restart oozie/hive daemons on an1003 for openjdk-8 upgrades [production]
08:38 <elukey> rolling restart of hadoop-hdfs-datanode on all the hadoop worker nodes after https://gerrit.wikimedia.org/r/423000 [production]
07:39 <elukey> rolling restart of yarn-hadoop-nodemanagers on all the hadoop worker nodes after https://gerrit.wikimedia.org/r/423000 [production]
2018-03-29 §
09:16 <elukey> roll restart aqs on aqs100* for icu/openssl upgrades [production]
08:07 <elukey> roll restart of cassandra on aqs* for openjdk-8 upgrades [production]
2018-03-28 §
13:51 <elukey> reduced number of jobrunner runners on the videoscalers after the last burst of jobs that maxed out the cluster [production]
2018-03-27 §
09:44 <elukey> reboot aqs1009 for kernel + cassandra upgrades [production]
09:28 <elukey> reboot aqs1008 for kernel + cassandra upgrades [production]
09:09 <elukey> reboot aqs1007 for kernel + cassandra upgrades [production]
08:33 <elukey> reboot aqs1006 for kernel + openjdk-8 + cassandra upgrade [production]
08:15 <elukey@puppetmaster1001> conftool action : set/pooled=no; selector: name=aqs1005.eqiad.wmnet [production]
08:11 <elukey> reboot aqs1005 for kernel + openjdk-8 + cassandra upgrade [production]
06:59 <elukey> powercycle restbase2007 (no ssh, vsp not available via mgmt console) [production]
2018-03-26 §
07:33 <elukey> stop eventlogging zmq-forwarder on eventlog1001 as part of decom process - T189566 [production]
2018-03-24 §
15:00 <elukey> rm -rf /srv/mediawiki/core on stat100[456] and force puppet run (git pull returned fatal: protocol error: bad pack header) [production]
2018-03-23 §
11:09 <elukey> restarting jvm daemons on analytics100[12] (Hadoop Masters) for openjdk-8 upgrade [production]
10:36 <elukey> upload cassandra2.2.6-wmf3 to jessie/stretch-wikimedia -C component/cassandra22 - T189529 [production]
08:19 <elukey> reboot eventlog1001 for kernel upgrades [production]
2018-03-22 §
14:16 <elukey> rolling restart of the three hadoop hdfs journal nodes (an1028/35/52) for openjdk-8 upgrades [production]
11:20 <elukey> rolling restart of the hadoop hdfs datanode daemons on all the analytics hadoop workers for openjdk-8 upgrade [production]
10:42 <elukey> update puppet compiler's fact [production]
09:55 <elukey> rolling restart of yarn nodemanagers on the analytics hadoop workers for openjdk-8 upgrade [production]
07:58 <elukey> depool cp3010 + powercycle (no ssh access, mgmt console frozen) [production]
2018-03-20 §
17:29 <elukey> test a depool/repool action for kafka1001 (eventbus/jobqueue) - part of an investigation to figure out where timeouts come from [production]
2018-03-19 §
15:23 <elukey> reboot kafka1003 for kernel upgrades (jobqueues/eventbus) [production]
14:34 <elukey> reboot kafka1002 (eventbus/jobqueue) for kernel upgrades [production]
09:37 <elukey> restart hadoop daemons on analytics1070 for openjdk upgrades (canary) [production]
08:41 <elukey> reboot thorium for kernel security upgrades (hosts all analytics websites, they will go down temporary) [production]
08:22 <elukey> revert previous state on aqs1004, the new pkg might need some more work - T189529 [production]
07:58 <elukey> manually installed cassandra-2.2.6-wmf3 on aqs1004 - T189529 [production]
07:47 <elukey> drain cassandra instances and reboot aqs1004 for kernel upgrades [production]
2018-03-17 §
18:41 <elukey> executed apt-get clean on scb1004 to free some space (root partition disk space warning) [production]
2018-03-16 §
14:25 <elukey> reboot druid1002 for kernel updates [production]
10:01 <elukey> restart eventlogging_sync on db1108 (eventlogging db slave) as precautions after the change of m4-master.eqiad.wmnet's CNAME [production]
09:57 <elukey> restart eventlogging-consumer@mysql-eventbus on eventlog1002 to force the DNS resolution of m4-master (changed from dbproxy1009 -> dbproxy1004) [production]
09:51 <elukey> restart eventlogging-consumer@mysql-m4 on eventlog1002 to force the DNS resolution of m4-master (changed from dbproxy1009 -> dbproxy1004) [production]