2021-09-22 §
17:23 <razzi> razzi@an-test-coord1001:/etc/presto$ sudo systemctl restart presto-server [analytics]
17:05 <joal> Kill-restart oozie jobs after deploy (mediawiki-history-denormalize-coord, mediawiki-history-check_denormalize-coord, mediawiki-history-dumps-coord, mediawiki-history-reduced-coord) [analytics]
11:54 <joal> release refiner-source v0.1.18 to archiva with Jenkins [analytics]
2021-09-20 §
08:12 <elukey> remove old /reportcard (password protected, old files from 2012) httpd settings for stats.wikimedia.org [analytics]
2021-09-18 §
06:48 <joal> Rerun webrequest-load-wf-text-2021-9-18-0 for errors after yesterday night production issue [analytics]
2021-09-17 §
16:03 <btullis> Cleared all snapshots on aqs100[47] to reclaim space with nodetool-[ab] clearsnapshot (T249755) [analytics]
15:15 <btullis> btullis@aqs1004:~$ sudo nodetool-a repair --full && sudo nodetool-b repair --full (T249755) [analytics]
10:18 <btullis> btullis@an-web1001:~$ sudo find /srv/published-rsynced -user systemd-coredump -exec chown stats {} \; [analytics]
09:47 <milimetric> deployed refinery to sync sanitize allowlist, deleting event_sanitized data per decision in the task [analytics]
08:21 <elukey> disable mod_cgi/mod_cgid on an-web1001 (and remove cgi-perl related httpd configs/settings) [analytics]
2021-09-16 §
19:25 <ottomata> pointing analytics-web cname at new an-web1001, this moves stats and analytics .wm.org from thorium to an-web1001 - T285355 [analytics]
18:30 <joal> Create HDFS home folder for user 'analytics-research' [analytics]
07:03 <elukey> stop jupyter-kaywong-singleuser.service on stat1005 to allow puppet to clean up [analytics]
2021-09-15 §
16:26 <joal> Deploying refinery [analytics]
2021-09-13 §
18:25 <razzi> (I stopped replication earlier but forgot to !log) [analytics]
18:24 <razzi> razzi@dbstore1007:~$ for socket in /run/mysqld/*; do sudo mysql --socket=$socket -e "START SLAVE"; done - reenable replication for T290841 [analytics]
18:19 <razzi> razzi@dbstore1007:~$ sudo systemctl restart mariadb@s4.service for T290841 [analytics]
18:13 <razzi> razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for T290841 [analytics]
18:05 <razzi> sudo systemctl restart mariadb@s2.service [analytics]
2021-09-07 §
11:41 <joal> Restarting cassandra hourly loading job after C2 snapshot taken and C3 tables truncated [analytics]
11:37 <joal> Re-Add test rows in cassandra3 cluster after tables got truncated [analytics]
10:25 <hnowlan> truncating data tables on aqs_next cluster [analytics]
10:12 <joal> Kill cassandra-hourl loading job for cluster-migration first step [analytics]
2021-09-03 §
11:43 <joal> Deploying refinery to hotfix mediarequest cassandra3 loading jobs (second) [analytics]
09:57 <joal> Deploy AQS on new AQS servers [analytics]
09:45 <joal> Kill-restart mediarequest-top cassandra loading jobs after deploy [analytics]
09:12 <joal> Rerun mediawiki-history-denormalize-wf-2021-08 after failure [analytics]
09:07 <joal> Deploying refinery to hotfix mediarequest cassandra3 loading jobs [analytics]
2021-09-01 §
16:44 <mforns> finished one-off deployment of refinery to fix cassandra3 loading [analytics]
15:57 <joal> Kill cassandra loading jobs and restart them after deploy [analytics]
15:55 <mforns> starting one-off deployment of refinery to fix cassandra3 loading [analytics]
13:15 <joal> Restart cassandra jobs to load cassandra3 with spark [analytics]
08:21 <joal> Rerun webrequest-load-wf-upload-2021-9-1-0 [analytics]
2021-08-31 §
23:25 <mforns> finished deployment of refinery (regular weekly train v0.1.17) successfully, only an-test-coord1001.eqiad.wmnet failed [analytics]
22:41 <mforns> starting deployment of refinery (regular weekly train v0.1.17) [analytics]
22:27 <mforns> Deployed refinery-source using jenkins [analytics]
10:30 <hnowlan> sudo cookbook sre.aqs.roll-restart aqs-next [analytics]
2021-08-30 §
06:53 <elukey> drop an-airflow1001's old airflow logs to fix root partition almost filled up [analytics]
2021-08-26 §
06:22 <elukey> root@an-launcher1002:/var/lib/puppet/clientbucket# find -type d -empty -delete [analytics]
06:21 <elukey> root@an-launcher1002:/var/lib/puppet/clientbucket# find -type f -delete -mtime +60 [analytics]
2021-08-25 §
13:40 <joal> Kill restart pageview-monthly_dump job and 2 backfilling jobs [analytics]
13:34 <joal> Deploy refinery onto HDFS [analytics]
13:09 <joal> Deploying refinery using scap [analytics]
2021-08-24 §
10:30 <btullis> btullis@an-launcher1002:~$ sudo systemctl start hdfs-balancer.service [analytics]
2021-08-20 §
08:46 <btullis> btullis@druid1001:~$ sudo systemctl stop druid-broker druid-coordinator druid-historical druid-middlemanager druid-overlord [analytics]
2021-08-19 §
19:05 <razzi> razzi@deploy1002:/srv/deployment/analytics/aqs/deploy$ scap deploy "Deploy aqs 9c062f2" [analytics]
19:02 <razzi> note that the aqs-deploy repo's commit message DOES NOT include the changes of aqs in its changes list (though it has the correct SHA in the first line) [analytics]
18:26 <razzi> Beginning aqs deploy process [analytics]
17:55 <razzi> razzi@labstore1007:~$ sudo systemctl start analytics-dumps-fetch-geoeditors_dumps.service [analytics]
17:53 <razzi> sudo systemctl start analytics-dumps-fetch-geoeditors_dumps.service on labstore1006 [analytics]