1-50 of 4161 results (12ms)
2021-10-21 §
14:05 <ottomata> rerun refine_eventlogging_analytics refine_eventlogging_legacy and refine_event with -ignore-done-flag=true --since=2021-10-21T01:00:00 --until=2021-10-21T04:00:00 for backfill of missing data after gobblin problems [analytics]
13:39 <btullis> btullis@an-launcher1002:~$ sudo systemctl restart gobblin-event_default [analytics]
10:35 <joal> Re-refine netflow data after gobblin pulled data fix [analytics]
08:41 <joal> Rerun webrequest-load jobs for hour 2021-10-21T02:00 [analytics]
2021-10-20 §
18:11 <razzi> Deployed refinery using scap, then deployed onto hdfs [analytics]
16:36 <razzi> deploy refinery change for https://phabricator.wikimedia.org/T287084 [analytics]
07:15 <joal> rerun webrequest-load-wf-upload-2021-10-20-1 after node issue [analytics]
06:27 <elukey> reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage [analytics]
2021-10-19 §
07:14 <joal> Rerun cassandra-daily-wf-local_group_default_T_mediarequest_top_files-2021-10-17 [analytics]
2021-10-18 §
19:29 <joal> Rerun cassandra-daily-wf-local_group_default_T_top_pageviews-2021-10-17 [analytics]
18:36 <joal> Rerun cassandra-daily-wf-local_group_default_T_unique_devices-2021-10-17 [analytics]
16:22 <joal> rerun cassandra-daily-wf-local_group_default_T_top_percountry-2021-10-17 [analytics]
16:16 <joal> Rerun cassandra-daily-wf-local_group_default_T_mediarequest_per_referer-2021-10-17 [analytics]
15:17 <joal> Rerun failed instances from cassandra-hourly-coord-local_group_default_T_pageviews_per_project_v2 [analytics]
14:49 <elukey> restart hadoop-yarn-nodemanager on an-worker1119 and an-worker1103 (Java OOM in the logs) [analytics]
12:09 <btullis> root@aqs1013:/srv/cassandra-b/tmp# systemctl restart cassandra-b.service [analytics]
12:09 <btullis> root@aqs1012:/srv/cassandra-b/tmp# systemctl restart cassandra-b.service [analytics]
09:25 <btullis> btullis@cumin1001:~$ sudo transfer.py aqs1013.eqiad.wmnet:/srv/cassandra-b/tmp/local_group_default_T_pageviews_per_article_flat an-presto1001.eqiad.wmnet:/srv/cassandra_migration/aqs1013-b/ [analytics]
09:17 <btullis> btullis@cumin1001:~$ sudo transfer.py aqs1012.eqiad.wmnet:/srv/cassandra-b/tmp/local_group_default_T_pageviews_per_article_flat an-presto1001.eqiad.wmnet:/srv/cassandra_migration/aqs1012-b/ [analytics]
09:16 <btullis> btullis@cumin1001:~$ sudo transfer.py aqs1012.eqiad.wmnet:/srv/cassandra-b/tmp/local_group_default_T_pageviews_per_article_flat an-presto1001.eqiad.wmnet:/srv/cassandra_migration/cassandra_migration/aqs1012-b/ [analytics]
2021-10-15 §
08:33 <btullis> btullis@aqs1007:~$ sudo nodetool-b clearsnapshot [analytics]
2021-10-13 §
19:49 <mforns> re-ran cassandra-daily-coord-local_group_default_T_pageviews_per_article_flat for 2021-10-12 successfully [analytics]
17:58 <ottomata> deleting files on stat1008 in /tmp older than 10 days and larger than 20M sudo find /tmp -mtime +10 -size +20M | xargs sudo rm -rfv [analytics]
17:54 <ottomata> removed /tmp/spark-* files belonging to aikochou on stat1008 [analytics]
2021-10-12 §
15:43 <btullis> btullis@aqs1008:~$ sudo nodetool-b clearsnapshot [analytics]
13:17 <btullis> btullis@analytics1069:~$ sudo shutdown -h now [analytics]
13:15 <btullis> btullis@analytics1069:~$ sudo systemctl stop hadoop-hdfs-* [analytics]
13:14 <btullis> btullis@analytics1069:~$ sudo systemctl stop hadoop-yarn-nodemanager.service [analytics]
07:26 <joal> Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-10-11 [analytics]
2021-10-11 §
07:37 <joal> rerun refine_event for `event`.`mediawiki_content_translation_event` year=2021/month=10/day=10/hour=16 [analytics]
2021-10-10 §
18:07 <joal> Rerun webrequest-load-wf-text-2021-10-10-10 - failed due to network issue [analytics]
2021-10-06 §
14:30 <elukey> upgrade stat1005 to ROCm 4.2.0 [analytics]
13:20 <btullis> btullis@aqs1004:~$ sudo nodetool-a clearsnapshot [analytics]
10:20 <elukey> upgrade ROCm to 4.2 on stat1008 [analytics]
2021-10-05 §
11:28 <elukey> failover analytics-hive back to an-coord1001 after maintenance [analytics]
2021-10-04 §
16:56 <elukey> restart java daemons on an-coord1001 (standby) [analytics]
13:43 <elukey> failover analytics-hive to an-coord1002 (to restart java daemons on 1001) [analytics]
07:43 <joal> Kill-restart mediawiki-history-reduced job after deploy (more ressources) [analytics]
07:32 <joal> Deploy refinery to hdfs [analytics]
07:10 <joal> Deploy refinery for mediawiki-history-reduced hotfix [analytics]
06:56 <joal> Kill-restart pageview-monthly_dump-coord to apply fix for SLA [analytics]
2021-10-01 §
15:11 <btullis> sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='editoractivation' --since='2021-09-29T22:00:00.000Z' --until='2021-09-30T23:00:00.000Z' [analytics]
2021-09-30 §
19:55 <ottomata> not changing to stats uid to 499; it already exists as a another system user [analytics]
19:54 <ottomata> changing stats uid and gid on an-launcher1002 and stat1005 to 499 [analytics]
09:32 <btullis> btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_netflow --ignore_failure_flag=true --since=2021-09-28T11:00:00 --until 2021-09-28T12:00:00 [analytics]
2021-09-29 §
09:16 <elukey> restart hive-* units on an-coord1002 for openjdk upgrades (standby node) [analytics]
2021-09-28 §
13:14 <btullis> Deployed refinery using scap, then deployed onto hdfs [analytics]
12:34 <btullis> deploying refinery [analytics]
09:55 <btullis> btullis@cumin1001:~$ sudo cumin --mode async 'aqs100*.eqiad.wmnet' 'nodetool-a snapshot -t T291472 local_group_default_T_pageviews_per_article_flat' 'nodetool-b snapshot -t T291472 local_group_default_T_pageviews_per_article_flat' [analytics]
09:36 <elukey> restart java daemons on an-test-coord1001 to pick up new openjdk [analytics]