151-200 of 4866 results (26ms)
2022-09-27 §
14:56 <mforns> deployed Airflow (fixed) [analytics]
14:23 <mforns> rolled back Airflow [analytics]
14:23 <mforns> deployed Airflow for 3 fixes [analytics]
2022-09-26 §
20:07 <xcollazo> Kill oozie geoeditors jobs for load, public monthly, and yearly after Airflow migration. [analytics]
16:13 <joal> rerunning failed webrequest-text-2022-09-26-15 [analytics]
13:48 <aqu> Deploying airflow-dags on analytics & analytics_test [analytics]
11:03 <btullis> failing back hive to an-coord1001 using DNS https://gerrit.wikimedia.org/r/c/operations/dns/+/832294 [analytics]
09:41 <btullis> rebooted matomo1002 at the VM level to pick up new disk [analytics]
09:40 <btullis> merged the spark3 patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/834500 [analytics]
06:36 <elukey> clean up my old home dir on matomo1002, ran `apt-get clean` + some other clean up steps on matomo1002 to free space on the root partition [analytics]
2022-09-23 §
19:11 <mforns> deployed Airflow analytics for a quick fix [analytics]
2022-09-22 §
22:26 <joal> Kill oozie cassandra monthly loading jobs as we migrate them to airflow [analytics]
22:20 <joal> Deploy airflow for cassandra-loading patch [analytics]
20:53 <joal> Deploy analytics airflow-dags to try to fix cassandra loading jobs [analytics]
2022-09-21 §
19:25 <joal> Kill oozie daily cassandra loading jobs as we move them to airflow [analytics]
19:18 <ottomata> kill aarora process 30421 run_embedding_training.sh on stat1005 [analytics]
19:13 <joal> Deployed refinery for HQL patch (Njideka) [analytics]
19:11 <ottomata> kill aarora process 14584 on stat1005 - using 2500% cpu [analytics]
2022-09-20 §
20:10 <mforns> finished refinery deployment (weekly train) [analytics]
19:55 <mforns> starting refinery deployment (weekly train) [analytics]
15:45 <joal> kill oozie hourly cassandra loading job (1 job) in favor of the airflow one [analytics]
2022-09-19 §
22:28 <milimetric> Wikistats: improved build a little and deployed fix to T312717 [analytics]
2022-09-15 §
08:43 <aqu> about to deploy analytics/refinery [analytics]
05:14 <aqu> sudo -u analytics kerberos-run-command analytics refine_eventlogging_legacy --table_include_regex='wikipediaportal' --since='2022-09-13T23:00:00.000Z' --until='2022-09-15T00:00:00.000Z' [analytics]
2022-09-14 §
17:11 <aqu> Sep 14 15:23:34 UTC sudo systemctl start check_webrequest_partitions.service [analytics]
12:56 <aqu> ~1hago sudo systemctl start refinery-sqoop-mediawiki-production-daily.service ; sudo systemctl start refinery-import-siteinfo-dumps.service ; sudo systemctl start refinery-import-page-current-dumps.service ; sudo systemctl start refinery-import-page-history-dumps.service [analytics]
11:34 <btullis> remounted all remaining /mnt/hdfs mount points, except stat1005 which is busy [analytics]
11:12 <btullis> remounted /mnt/hdfs on an-coord100[1-2] [analytics]
11:09 <btullis> remounted /mnt/hdfs on an-airflow1001 [analytics]
09:14 <joal> Restart oozie virtualpageview job [analytics]
09:10 <btullis> re-mounted /mnt/hdfs on an-launcher1002. [analytics]
07:11 <joal> restart webrequest oozie bundle [analytics]
2022-09-13 §
17:22 <joal> rerun refine_eventloggin_legacy [analytics]
17:14 <joal> rerun refine_event [analytics]
17:14 <joal> rerun refine_netflow [analytics]
16:53 <joal> Rerun refine_eventlogging_analytics [analytics]
16:45 <joal> Kill-rerun suspended oozie jobs (virtual-pagview and predictions-actor [analytics]
16:34 <joal> rerun failed webrequest oozie jobs [analytics]
16:30 <btullis> restarting hive-server2 and hive-metastore on an-coord1001 (currently standby) [analytics]
16:29 <btullis> restarting oozie on an-coord1001 [analytics]
16:10 <joal> Rerun failed oozie webrequest jobs [analytics]
15:57 <btullis> rolling out updated hadoop packages to an-airflow1003 [analytics]
15:55 <btullis> rolling out upgraded hadoop client packages to stat servers. [analytics]
15:51 <btullis> restarting eventlogging_to_druid_network_flows_internal_hourly.service eventlogging_to_druid_prefupdate_hourly.service refine_event_sanitized_analytics_immediate.service refine_event_sanitized_main_immediate.service [analytics]
15:49 <btullis> restarting eventlogging_to_druid_navigationtiming_hourly.service on an-launcher1002 [analytics]
15:46 <btullis> restarting eventlogging_to_druid_editattemptstep_hourly.service on an-launcher1002 [analytics]
15:44 <btullis> cancel that last message. Upgrading hadoop packages on an-launcher instead. They were inadvertently omitted last time. [analytics]
15:39 <btullis> Going to downgrade hadoop on ann hadoop-worker nodes to 2.10.1 [analytics]
15:21 <btullis> failed over hive to an-coord1002 via DNS https://gerrit.wikimedia.org/r/c/operations/dns/+/831906 [analytics]
15:20 <btullis> restarted yarn service on an-master1002 to make the active host an-master1001 again. [analytics]