801-850 of 5815 results (19ms)
2023-03-13 §
13:24 <btullis> restarting an-worker1140 [analytics]
2023-03-10 §
20:04 <milimetric> deployed refinery with new pageview jobs, patched in a manual copy of static_data/pageview/whitelist/whitelist.tsv because that file was renamed in the most recent version and would have broken jobs otherwise [analytics]
2023-03-09 §
19:47 <btullis> shutting down an-worker1078 for RAID BBU replacement T331544 [analytics]
18:51 <mforns> deployed airflow analytics (2.5) with the T326194_airflow_deb_creation_with_gitlab_ci branch [analytics]
17:55 <joal> Force kill druid indexing task to unlock druid_load_navigationtiming_daily__load_to_druid__20230228 [analytics]
17:46 <btullis> deploying spark-operator once more [analytics]
16:49 <btullis> deploying updated spark-operator to dse-k8s cluster. [analytics]
14:04 <btullis> airflow services were started automatically. airflow db check was successful. [analytics]
14:00 <btullis> running puppet on an-launcer1002 to pull the new package after https://gerrit.wikimedia.org/r/c/operations/puppet/+/896098 is merged. [analytics]
13:06 <steve_munene> upgrading analytics airflow to 2.5.1 on an-launcher1002 [analytics]
2023-03-08 §
11:54 <ottomata> Deployed refinery using scap, then deployed onto hdfs [analytics]
10:36 <nfraison> restart namenode in an-master1002 to take in account new quota init threads setting [analytics]
10:25 <nfraison> failover namenode in prod from an-master1002-eqiad-wmnet to an-master1001-eqiad-wmnet [analytics]
09:59 <nfraison> restart namenode in an-master1001 (standby in prod) to take in account new quota init threads setting [analytics]
09:53 <nfraison> restart namenode in an-test-master1002 to take in account new quota init threads setting [analytics]
09:52 <nfraison> failover namenode in test from an-test-master1002-eqiad-wmnet to an-test-master1001-eqiad-wmnet [analytics]
09:47 <nfraison> restart namenode in an-test-master1001 to take in account new quota init threads setting [analytics]
09:36 <nfraison> restart test hiveserver2: T303168 [analytics]
09:13 <nfraison> restart prod resourcemanager to take in account new dedicated exclude file [analytics]
08:58 <nfraison> restart test resourcemanager to take in account new dedicated exclude file [analytics]
07:56 <nfraison> restart prod jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 [analytics]
07:47 <nfraison> restart test jobhistory to take in account: https://gerrit.wikimedia.org/r/c/operations/puppet/+/894481 [analytics]
2023-03-07 §
22:03 <mforns> deployed airflow analytics again to try and fix druid_load_edit_hourly [analytics]
16:55 <xcollazo> deployed image-suggestions hotfix to platform_eng Airflow instance. See https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/262. [analytics]
15:23 <btullis> re-enabling ingestion via gobblin. [analytics]
14:59 <nfraison> force startup of nodemanager on analytics_cluster [analytics]
14:58 <btullis> pooled druid1004 [analytics]
14:57 <btullis> pooling aqs1010 and aqs1016 [analytics]
14:56 <btullis> pooling datahubsearch1001 [analytics]
14:53 <btullis> leaving safe mode on hdfs [analytics]
13:59 <btullis> disabled puppet temporarily on an-master100[1-2] to avoid an automatic restart of yarn [analytics]
13:57 <btullis> stopped `hadoop-yarn-resourcemanager.service` on both an-master100[1-2] [analytics]
13:54 <btullis> entering safe mode with `sudo -u hdfs kerberos-run-command hdfs hadoop dfsadmin -safemode enter` on an-master1002 [analytics]
12:57 <btullis> depooled druid1004 for T329073 [analytics]
12:56 <btullis> depooled datahubsearch1001 for T329073 [analytics]
12:51 <btullis> disabled gobblin timers on an-launcher1002 [analytics]
12:46 <btullis> depooling aqs1016for T329073 [analytics]
12:45 <btullis> depooling aqs1010 for T329073 [analytics]
08:00 <nfraison> Reimage an-conf1003 to upgrade to bullseye T329362 [analytics]
2023-03-06 §
23:12 <mforns> deployed airflow analytics to unbreak druid-load-edit-hourly [analytics]
15:26 <mforns> deployed airflow analytics to unbreak druid-load-edit-hourly [analytics]
13:53 <btullis> failing over the production hadoop cluster namenode service to an-master1002 [analytics]
13:17 <btullis> failing over analytics test cluster namenode service to an-test-master1002 T329073 [analytics]
12:26 <nfraison> Reimage an-conf1002 to upgrade to bullseye T329362 [analytics]
10:15 <ottomata> deploy mediawiki_history_reduced_2023_02 snapshot to AQS [analytics]
09:23 <nfraison> Reimage an-conf1001 to upgrade to bullseye T329362 [analytics]
2023-03-03 §
16:48 <xcollazo> Deleted snapshot=2023-02-20 for tables image_suggestions_search_index_full, image_suggestions_search_index_delta, image_suggestions_lead_image_data and image_suggestions_wikidata_data from the analytics_platform_eng schema. This data will be regenerated. See https://phabricator.wikimedia.org/T330688. [analytics]
15:53 <mforns> deployed airflow analytics to unbreak edit_hourly_dag [analytics]
15:44 <xcollazo> Deploying latest image_suggestions DAG on platform_eng Airflow instance [analytics]
07:29 <elukey> truncate /var/log/auth.log.1 on krb1001 to free space (root partition almost filled up) [analytics]