2023-03-08
§
|
11:54 |
<ottomata> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
10:36 |
<nfraison> |
restart namenode in an-master1002 to take in account new quota init threads setting |
[analytics] |
10:25 |
<nfraison> |
failover namenode in prod from an-master1002-eqiad-wmnet to an-master1001-eqiad-wmnet |
[analytics] |
09:59 |
<nfraison> |
restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting |
[analytics] |
09:53 |
<nfraison> |
restart namenode in an-test-master1002 to take in account new quota init threads setting |
[analytics] |
09:52 |
<nfraison> |
failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet |
[analytics] |
09:47 |
<nfraison> |
restart namenode in an-test-master1001 to take in account new quota init threads setting |
[analytics] |
09:36 |
<nfraison> |
restart test hiveserver2: T303168 |
[analytics] |
09:13 |
<nfraison> |
restart prod resourcemanager to take in account new dedicated exclude file |
[analytics] |
08:58 |
<nfraison> |
restart test resourcemanager to take in account new dedicated exclude file |
[analytics] |
07:56 |
<nfraison> |
restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 |
[analytics] |
07:47 |
<nfraison> |
restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 |
[analytics] |
2023-03-07
§
|
22:03 |
<mforns> |
deployed airflow analytics again to try and fix druid_load_edit_hourly |
[analytics] |
16:55 |
<xcollazo> |
deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262. |
[analytics] |
15:23 |
<btullis> |
re-enabling ingestion via gobblin. |
[analytics] |
14:59 |
<nfraison> |
force startup of nodemanager on analytics_cluster |
[analytics] |
14:58 |
<btullis> |
pooled druid1004 |
[analytics] |
14:57 |
<btullis> |
pooling aqs1010 and aqs1016 |
[analytics] |
14:56 |
<btullis> |
pooling datahubsearch1001 |
[analytics] |
14:53 |
<btullis> |
leaving safe mode on hdfs |
[analytics] |
13:59 |
<btullis> |
disabled puppet temporarily on an-master100[1-2] to avoid an automatic restart of yarn |
[analytics] |
13:57 |
<btullis> |
stopped `hadoop-yarn-resourcemanager.service` on both an-master100[1-2] |
[analytics] |
13:54 |
<btullis> |
entering safe mode with `sudo -u hdfs kerberos-run-command hdfs hadoop dfsadmin -safemode enter` on an-master1002 |
[analytics] |
12:57 |
<btullis> |
depooled druid1004 for T329073 |
[analytics] |
12:56 |
<btullis> |
depooled datahubsearch1001 for T329073 |
[analytics] |
12:51 |
<btullis> |
disabled gobblin timers on an-launcher1002 |
[analytics] |
12:46 |
<btullis> |
depooling aqs1016for T329073 |
[analytics] |
12:45 |
<btullis> |
depooling aqs1010 for T329073 |
[analytics] |
08:00 |
<nfraison> |
Reimage an-conf1003 to upgrade to bullseye T329362 |
[analytics] |