8101-8150 of 10000 results (32ms)
2019-10-15 §
14:42 <elukey> start a root tmux containing a bash script on conf1004 to clean up znodes under /yarn-rmstore/analytics-hadoop/ZKRMStateRoot/RMAppRoot slowly - T217057 [production]
14:34 <elukey> executed 'rmr' in zookeeper on conf1004 for znodes /yarn-leader-election /hadoop-ha /hive_zookeeper_namespace [production]
12:46 <elukey> Hadoop maintenance over [production]
12:17 <elukey> Hadoop maintenance start - migration to the new Zookepeer cluster [production]
12:06 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
12:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:06 <elukey> upload new version of memkeys (adding a patch to merged to upstream to avoid segfaults on stretch/buster) to stretch|buster wikimedia apt repos - T223863 [production]
2019-10-14 §
14:28 <elukey> upload matomo 3.11 to stretch-wikimedia and upgrade matomo1001 - T234607 [production]
2019-10-09 §
17:22 <elukey> roll restart aqs on aqs100[4-9] to pick up new Druid config changes [production]
14:38 <elukey> cr1-eqsin: change IPv6 address for BGP peer AS4761 [production]
2019-10-08 §
08:33 <elukey> roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684 [production]
05:44 <elukey> drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892 [production]
05:35 <elukey> drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893 [production]
2019-10-07 §
13:13 <elukey> upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941 [production]
12:54 <elukey> upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941 [production]
06:08 <elukey> upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941 [production]
2019-10-06 §
06:47 <elukey> delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam [production]
2019-10-05 §
06:48 <elukey> force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory [production]
2019-10-04 §
12:23 <elukey> cleaned up old files and apt-cache from an-coord1001 [production]
07:26 <elukey> execute gnt-instance remove kerberos1001 on ganeti1001 - T234600 [production]
07:24 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
07:24 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
2019-10-03 §
13:49 <elukey> roll restart hadoop yarn resource managers for openssl updates on Hadoop workers [production]
10:27 <elukey> killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs [production]
09:32 <elukey> run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop) [production]
2019-10-01 §
15:36 <elukey> powercycle an-conf1001 to test some bios settings [production]
06:29 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [production]
06:29 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
06:28 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
06:28 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
2019-09-27 §
15:34 <elukey> update pcc facts to add new hosts [production]
12:58 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
12:56 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2019-09-26 §
17:35 <elukey> run apt-get autoremove on stat* and notebook* to clean up old python2 deps [production]
08:07 <elukey> executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057 [production]
2019-09-25 §
07:17 <elukey> allow analytics users to log in into stat1005 [production]
2019-09-24 §
07:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
07:24 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
2019-09-23 §
16:53 <elukey@deploy1001> Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s) [production]
16:46 <elukey@deploy1001> Started deploy [analytics/refinery@b99647e]: (no justification provided) [production]
08:31 <elukey@deploy1001> Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s) [production]
08:24 <elukey@deploy1001> Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes [production]
2019-09-18 §
14:21 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
14:16 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:40 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:38 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
13:26 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
13:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
07:43 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) [production]
2019-09-17 §
17:08 <elukey@cumin1001> START - Cookbook sre.hadoop.reboot-workers [production]