2021-08-11
§
|
19:43 |
<btullis> |
btullis@druid1003:~$ sudo systemctl stop druid-overlord && sudo systemctl disable druid-overlord |
[analytics] |
19:41 |
<btullis> |
btullis@druid1003:~$ sudo systemctl stop druid-historical && sudo systemctl disable druid-historical |
[analytics] |
19:40 |
<btullis> |
btullis@druid1003:~$ sudo systemctl stop druid-coordinator && sudo systemctl disable druid-coordinator |
[analytics] |
19:37 |
<btullis> |
btullis@druid1003:~$ sudo systemctl stop druid-broker && sudo systemctl disable druid-broker |
[analytics] |
19:30 |
<btullis> |
btullis@druid1003:~$ curl -X POST http://druid1003.eqiad.wmnet:8091/druid/worker/v1/disable |
[analytics] |
12:13 |
<btullis> |
migration of zookeeper from druid1002 to an-druid1002 complete, with quorum and two zynced followers. Re-enabling puppet on all druid nodes. |
[analytics] |
09:48 |
<btullis> |
suspended the following oozie jobs in hue: webrequest-druid-hourly-coord, pageview-druid-hourly-coord, edit-hourly-druid-coord |
[analytics] |
09:45 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl disable eventlogging_to_druid_editattemptstep_hourly.timer eventlogging_to_druid_navigationtiming_hourly.timer eventlogging_to_druid_netflow_hourly.timer eventlogging_to_druid_prefupdate_hourly.timer |
[analytics] |
09:21 |
<elukey> |
run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full) |
[analytics] |
2021-07-20
§
|
20:30 |
<joal> |
rerun webrequest timed-out instances |
[analytics] |
18:58 |
<mforns> |
starting refinery deployment |
[analytics] |
18:40 |
<razzi> |
razzi@an-launcher1002:~$ sudo puppet agent --enable |
[analytics] |
18:39 |
<razzi> |
razzi@an-master1001:/var/log/hadoop-hdfs$ sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
18:37 |
<razzi> |
razzi@an-master1002:~$ sudo -i puppet agent --enable |
[analytics] |
18:34 |
<razzi> |
razzi@an-master1002:~$ sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
18:32 |
<razzi> |
razzi@an-master1002:~$ sudo systemctl start hadoop-yarn-resourcemanager.service |
[analytics] |
18:31 |
<razzi> |
razzi@an-master1002:~$ sudo systemctl stop hadoop-yarn-resourcemanager.service |
[analytics] |
18:22 |
<razzi> |
sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet |
[analytics] |
18:21 |
<razzi> |
re-enable yarn queues by merging puppet patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/705732 |
[analytics] |
17:27 |
<razzi> |
razzi@cumin1001:~$ sudo -i wmf-auto-reimage-host -p T278423 an-master1001.eqiad.wmnet |
[analytics] |
17:17 |
<razzi> |
stop all hadoop processes on an-master1001 |
[analytics] |
16:52 |
<razzi> |
starting hadoop processes on an-master1001 since they didn't failover cleanly |
[analytics] |
16:31 |
<razzi> |
sudo bash gid_script.bash on an-maseter1001 |
[analytics] |
16:29 |
<razzi> |
razzi@alert1001:~$ sudo icinga-downtime -h an-master1001 -d 7200 -r "an-master1001 debian upgrade" |
[analytics] |
16:25 |
<razzi> |
razzi@an-master1001:~$ sudo systemctl stop hadoop-mapreduce-historyserver |
[analytics] |
16:25 |
<razzi> |
sudo systemctl stop hadoop-hdfs-zkfc.service on an-master1001 again |
[analytics] |
16:25 |
<razzi> |
sudo systemctl stop hadoop-yarn-resourcemanager on an-master1001 again |
[analytics] |