2021-06-15
§
|
15:38 |
<razzi> |
backup /srv/hadoop/name/current to /home/razzi/hdfs-namenode-snapshot-buster-reimage-2021-06-15.tar.gz on an-master1001 |
[analytics] |
15:33 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace |
[analytics] |
15:27 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter |
[analytics] |
15:25 |
<razzi> |
kill running yarn applications via for loop |
[analytics] |
15:11 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
15:09 |
<razzi> |
disable puppet on an-mastesr |
[analytics] |
15:08 |
<razzi> |
run puppet on an-masters to update capacity-scheduler.xml |
[analytics] |
15:02 |
<razzi> |
disable puppet on an-masters |
[analytics] |
15:01 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to stop queues |
[analytics] |
14:35 |
<razzi> |
disable jobs that use hadoop on an-launcher1002 following https://phabricator.wikimedia.org/T278423#7094641 |
[analytics] |
2021-05-25
§
|
18:16 |
<razzi> |
sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002 |
[analytics] |
18:14 |
<razzi> |
sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service |
[analytics] |
18:01 |
<razzi> |
manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
17:52 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002 |
[analytics] |
17:28 |
<razzi> |
sudo systemctl restart refine_eventlogging_legacy |
[analytics] |
17:28 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again |
[analytics] |
17:07 |
<razzi> |
re-enabled puppet on an-masters and an-launcher |
[analytics] |
17:04 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave |
[analytics] |
17:03 |
<razzi> |
sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet |
[analytics] |