51-100 of 3511 results (11ms)
2021-02-12 §
15:16 <elukey> roll restart druid broker on druid-public to pick up new settings [analytics]
07:54 <elukey> roll restart of druid brokers on druid-public - locked after scheduled datasource deletion [analytics]
07:46 <elukey> force a manual run of refinery-druid-drop-public-snapshots on an-launcher1002 (3d before its natural start) - controlled execution to see how druid + 3xdataset replication reacts [analytics]
2021-02-11 §
14:26 <joal> Restart oozie API job after spark sharelib fix (start: 2021-02-10T18:00) [analytics]
14:20 <joal> Rerun failed clicstream instance 2021-01 after sharelib fix [analytics]
14:16 <joal> Restart oozie after having fixed the spark-2.4.4 sharelib [analytics]
14:12 <joal> Fix oozie sharelib for spark-2.4.4 by copying oozie-sharelib-spark-4.3.0.jar onto the spark folder [analytics]
02:19 <milimetric> deployed again to fix old spelling error :) referererererer [analytics]
00:05 <milimetric> deployed refinery and synced to hdfs, restarting cassandra jobs gently [analytics]
2021-02-10 §
21:46 <razzi> rebalance kafka partitions for eqiad.mediawiki.cirrussearch-request [analytics]
21:10 <razzi> rebalance kafka partitions for codfw.mediawiki.cirrussearch-request [analytics]
19:11 <elukey> drop /user/oozie/share + chown o+rx -R /user/oozie/share + restart oozie [analytics]
17:56 <razzi> rebalance kafka partitions for eventlogging-client-side [analytics]
01:07 <milimetric> deployed refinery with some fixes after BigTop upgrade, will restart three coordinators right now [analytics]
2021-02-09 §
22:04 <razzi> rebalance kafka partitions for eqiad.resource-purge [analytics]
20:51 <joal> Rerun webrequest-load-coord-[text|upload] for 2021-02-09T07:00 after data was imported to camus [analytics]
20:50 <razzi> rebalance kafka partitions for codfw.resource-purge [analytics]
20:31 <joal> Rerun webrequest-load-coord-[text|upload] for 2021-02-09T06:00 after data was imported to camus [analytics]
16:30 <elukey> restart datanode on ana-worker1100 [analytics]
16:14 <ottomata> restart datanode on analytics1059 with 16g heap [analytics]
16:08 <ottomata> restart datanode on an-worker1080 withh 16g heap [analytics]
15:58 <ottomata> restart datanode on analytics1058 [analytics]
15:55 <ottomata> restart datenode on an-worker1115 [analytics]
15:38 <elukey> restart namenode on an-master1002 [analytics]
15:01 <elukey> restart an-worker1104 with 16g heap size to allow bootstrap [analytics]
15:01 <elukey> restart an-worker1103 with 16g heap size to allow bootstrap [analytics]
14:57 <elukey> restart an-worker1102 with 16g heap size to allow bootstrap [analytics]
14:54 <elukey> restart an-worker1090 with 16g heap size to allow bootstrap [analytics]
14:50 <elukey> restart analytics1072 with 16g heap size to allow bootstrap [analytics]
14:50 <elukey> restart analytics1069 with 16g heap size to allow bootstrap [analytics]
14:08 <elukey> restart analytics1069's datanode with bigger heap size [analytics]
13:39 <elukey> restart hdfs-datanode on analytics10[65,69] - failed to bootstrap due to issues reading datanode dirs [analytics]
13:38 <elukey> restart hdfs-datanode on an-worker1080 (test canary - not showing up in block report) [analytics]
10:04 <elukey> stop mysql replication an-coord1001 -> an-coord1002, an-coord1001 -> db1108 [analytics]
08:29 <elukey> leave hdfs safemode to let distcp do its job [analytics]
08:25 <elukey> set hdfs safemode on for the Analytics cluster [analytics]
08:19 <elukey> umount /mnt/hdfs from all nodes using it [analytics]
08:16 <joal> Kill flink yarn app [analytics]
08:08 <elukey> stop jupyterhub on stat100x [analytics]
08:07 <elukey> stop hive on an-coord100[1,2] - prep step for bigtop upgrade [analytics]
08:05 <elukey> stop oozie an-coord1001 - prep step for bigtop upgrade [analytics]
08:03 <elukey> stop presto-server on an-presto100x and an-coord1001 - prep step for bigtop upgrade [analytics]
07:28 <elukey> roll out new apt bigtop changes across all hadoop-related nodes [analytics]
07:19 <joal> Killing yarn users applications [analytics]
07:12 <elukey> stop airflow on an-airflow1001 (prep step for bigtop) [analytics]
07:09 <elukey> stop namenode on an-worker1124 (backup cluster), create two new partitions for backup and namenode, restart namenode [analytics]
06:14 <elukey> disable timers on labstore nodes (prep step for bigtop) [analytics]
06:11 <elukey> disable systemd timers on an-launcher1002 (prep step for bigtop) [analytics]
2021-02-08 §
22:29 <elukey> the previous entry was related to the Hadoop backup cluster [analytics]
22:29 <elukey> hdfs master failover an-worker1118 -> an-worker1124, created dedicated partition for /var/lib/hadoop/name (root partition filled up), restarted namenode on 1118 (now recovering edit logs) [analytics]