1-50 of 4140 results (12ms)
2021-10-13 §
19:49 <mforns> re-ran cassandra-daily-coord-local_group_default_T_pageviews_per_article_flat for 2021-10-12 successfully [analytics]
17:58 <ottomata> deleting files on stat1008 in /tmp older than 10 days and larger than 20M sudo find /tmp -mtime +10 -size +20M | xargs sudo rm -rfv [analytics]
17:54 <ottomata> removed /tmp/spark-* files belonging to aikochou on stat1008 [analytics]
2021-10-12 §
15:43 <btullis> btullis@aqs1008:~$ sudo nodetool-b clearsnapshot [analytics]
13:17 <btullis> btullis@analytics1069:~$ sudo shutdown -h now [analytics]
13:15 <btullis> btullis@analytics1069:~$ sudo systemctl stop hadoop-hdfs-* [analytics]
13:14 <btullis> btullis@analytics1069:~$ sudo systemctl stop hadoop-yarn-nodemanager.service [analytics]
07:26 <joal> Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-10-11 [analytics]
2021-10-11 §
07:37 <joal> rerun refine_event for `event`.`mediawiki_content_translation_event` year=2021/month=10/day=10/hour=16 [analytics]
2021-10-10 §
18:07 <joal> Rerun webrequest-load-wf-text-2021-10-10-10 - failed due to network issue [analytics]
2021-10-06 §
14:30 <elukey> upgrade stat1005 to ROCm 4.2.0 [analytics]
13:20 <btullis> btullis@aqs1004:~$ sudo nodetool-a clearsnapshot [analytics]
10:20 <elukey> upgrade ROCm to 4.2 on stat1008 [analytics]
2021-10-05 §
11:28 <elukey> failover analytics-hive back to an-coord1001 after maintenance [analytics]
2021-10-04 §
16:56 <elukey> restart java daemons on an-coord1001 (standby) [analytics]
13:43 <elukey> failover analytics-hive to an-coord1002 (to restart java daemons on 1001) [analytics]
07:43 <joal> Kill-restart mediawiki-history-reduced job after deploy (more ressources) [analytics]
07:32 <joal> Deploy refinery to hdfs [analytics]
07:10 <joal> Deploy refinery for mediawiki-history-reduced hotfix [analytics]
06:56 <joal> Kill-restart pageview-monthly_dump-coord to apply fix for SLA [analytics]
2021-10-01 §
15:11 <btullis> sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='editoractivation' --since='2021-09-29T22:00:00.000Z' --until='2021-09-30T23:00:00.000Z' [analytics]
2021-09-30 §
19:55 <ottomata> not changing to stats uid to 499; it already exists as a another system user [analytics]
19:54 <ottomata> changing stats uid and gid on an-launcher1002 and stat1005 to 499 [analytics]
09:32 <btullis> btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_netflow --ignore_failure_flag=true --since=2021-09-28T11:00:00 --until 2021-09-28T12:00:00 [analytics]
2021-09-29 §
09:16 <elukey> restart hive-* units on an-coord1002 for openjdk upgrades (standby node) [analytics]
2021-09-28 §
13:14 <btullis> Deployed refinery using scap, then deployed onto hdfs [analytics]
12:34 <btullis> deploying refinery [analytics]
09:55 <btullis> btullis@cumin1001:~$ sudo cumin --mode async 'aqs100*.eqiad.wmnet' 'nodetool-a snapshot -t T291472 local_group_default_T_pageviews_per_article_flat' 'nodetool-b snapshot -t T291472 local_group_default_T_pageviews_per_article_flat' [analytics]
09:36 <elukey> restart java daemons on an-test-coord1001 to pick up new openjdk [analytics]
2021-09-27 §
11:18 <btullis> btullis@stat1005:~$ sudo apt purge usrmerge [analytics]
11:11 <btullis> btullis@stat1005:~$ sudo apt install usrmerge [analytics]
2021-09-24 §
22:33 <razzi> restart an-test-coord presto coordinator service to experiment withweb-ui.authentication.type=fixed [analytics]
15:06 <btullis> btullis@cumin1001:~$ sudo cumin --mode async 'aqs100[4,7].eqiad.wmnet' 'nodetool-a snapshot -t T291469' 'nodetool-b snapshot -t T291469' [analytics]
14:47 <btullis> btullis@aqs1007:~$ sudo nodetool-a repair --full local_group_default_T_mediarequest_per_file data [analytics]
11:02 <btullis> btullis@an-master1001:~$ sudo systemctl restart hadoop-mapreduce-historyserver [analytics]
10:47 <btullis> btullis@an-master1002:~$ sudo systemctl restart hadoop-hdfs-namenode [analytics]
10:47 <btullis> btullis@an-master1002:~$ sudo systemctl restart hadoop-hdfs-zkfc [analytics]
10:35 <btullis> btullis@an-master1001:~$ sudo -u hdfs kerberos-run-command hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
10:07 <btullis> btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='centralnoticeimpression' --since='2021-09-23T04:00:00.000Z' --until='2021-09-24T05:00:00.000Z' [analytics]
2021-09-22 §
17:23 <razzi> razzi@an-test-coord1001:/etc/presto$ sudo systemctl restart presto-server [analytics]
17:05 <joal> Kill-restart oozie jobs after deploy (mediawiki-history-denormalize-coord, mediawiki-history-check_denormalize-coord, mediawiki-history-dumps-coord, mediawiki-history-reduced-coord) [analytics]
11:54 <joal> release refiner-source v0.1.18 to archiva with Jenkins [analytics]
2021-09-20 §
08:12 <elukey> remove old /reportcard (password protected, old files from 2012) httpd settings for stats.wikimedia.org [analytics]
2021-09-18 §
06:48 <joal> Rerun webrequest-load-wf-text-2021-9-18-0 for errors after yesterday night production issue [analytics]
2021-09-17 §
16:03 <btullis> Cleared all snapshots on aqs100[47] to reclaim space with nodetool-[ab] clearsnapshot (T249755) [analytics]
15:15 <btullis> btullis@aqs1004:~$ sudo nodetool-a repair --full && sudo nodetool-b repair --full (T249755) [analytics]
10:18 <btullis> btullis@an-web1001:~$ sudo find /srv/published-rsynced -user systemd-coredump -exec chown stats {} \; [analytics]
09:47 <milimetric> deployed refinery to sync sanitize allowlist, deleting event_sanitized data per decision in the task [analytics]
08:21 <elukey> disable mod_cgi/mod_cgid on an-web1001 (and remove cgi-perl related httpd configs/settings) [analytics]
2021-09-16 §
19:25 <ottomata> pointing analytics-web cname at new an-web1001, this moves stats and analytics .wm.org from thorium to an-web1001 - T285355 [analytics]