901-950 of 6149 results (31ms)
2023-06-12 §
12:39 <btullis> ran apt clean on an-testui1001 to get some free disk space. [analytics]
11:30 <btullis> resuming deployment of eventgate-main [analytics]
09:58 <btullis> deploying eventgate-main [analytics]
08:52 <btullis> restart monitor_refine_netflow service on an-launcher1002 after successful job re-run. [analytics]
08:36 <btullis> re-running the refine_netflow task [analytics]
2023-06-09 §
20:40 <btullis> restarting the aqs service more quickly with: `sudo cumin -b 2 -s 10 A:aqs 'systemctl restart aqs'` [analytics]
20:23 <btullis> btullis@cumin1001:~$ sudo cookbook sre.aqs.roll-restart-reboot --alias aqs restart_daemons --reason aqs_rollback_btullis [analytics]
20:22 <btullis> merged and deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/928927 to revert aqs mediawiki snapshot change [analytics]
2023-06-08 §
17:12 <btullis> running the sre.hadoop.roll-restart-masters cookbook for the analytics cluster, to pick up the new journalnode for T338336 [analytics]
17:01 <btullis> running puppet on an-worker1142 to start the new journalnode [analytics]
06:42 <stevemunene> stop hadoop-hdfs-journalnode on analytics1069 in order to swap the journal node with an-worker1142 T338336 [analytics]
06:10 <elukey> kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet [analytics]
2023-06-07 §
15:38 <btullis> installing presto 0.281 to the test cluster [analytics]
15:23 <elukey> all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825 [analytics]
14:13 <btullis> running `sudo cumin A:wikireplicas-web 'maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-web [analytics]
14:04 <btullis> running `maintain-views --all-databases --table abuse_filter_history --replace-all` on A:wikireplicas-analytics [analytics]
11:52 <btullis> running `sudo maintain-views --all-databases --table abuse_filter_history --replace-all` on clouddbd1021 for T315426 [analytics]
08:02 <elukey> set "loadByPeriod(P15D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460 [analytics]
2023-06-06 §
15:52 <elukey> restart yarn resourcemanager on an-master1002 to restore the Yarn UI (that works only when the active yarn RM is on 1001) [analytics]
15:07 <mforns> deployed airflow analytics to try and fix the edit_hourly DAG again [analytics]
13:09 <ottomata> EventStreamConfig - temporarily Disable canary events and hadoop ingestion for development.network.probe stream - T332024 [analytics]
11:29 <stevemunene> service hadoop-yarn-resourcemanager restart for T317861 [analytics]
11:13 <btullis> restart airflow-scheduler service on an-test-client1001 for analytics_test instance [analytics]
11:12 <btullis> restart airflow-scheduler service on an-airflow1006 for product_analytics instance [analytics]
11:12 <btullis> restart airflow-scheduler service on an-airflow1005 for search instance [analytics]
11:08 <btullis> restart airflow-scheduler service on an-airflow1002 for research instance [analytics]
11:07 <btullis> (correction) that should have read an-airflow1004 for platform_eng instance [analytics]
11:06 <btullis> restart airflow-scheduler service on an-launcher1004 for postgresql restart [analytics]
11:05 <btullis> restart airflow-scheduler service on an-launcher1002 for postgresql restart [analytics]
05:41 <stevemunene> hadoop-yarn-resourcemanager restart for T317861 [analytics]
2023-06-05 §
18:20 <btullis> restarted haproxy service on dbproxy1018 for T338172 [analytics]
16:21 <btullis> depooling service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet [analytics]
16:20 <btullis> pooling service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet to allow us to depool the analytics wikireplica servers [analytics]
15:19 <mforns> deployed airflow analytics to fix edit_hourly DAG [analytics]
11:43 <btullis> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
09:52 <btullis> powered up an-worker1125 [analytics]
2023-06-01 §
19:09 <mforns> deploy airflow analytics to bump up cassandra load monthly for top articles [analytics]
17:50 <mforns> deployed airflow analytics to unbreak monthly cassandra loading DAGs [analytics]
14:13 <mforns> deployed airflow analytics to fix anomaly detection ooms [analytics]
2023-05-31 §
20:41 <mforns> finished refinery deployment [analytics]
20:20 <mforns> starting refinery deployment [analytics]
07:29 <elukey> set "loadByPeriod(P8D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460 [analytics]
2023-05-30 §
15:52 <xcollazo> created HDFS folder `/wmf/data/wmf_traffic` (T335305 and T337562) [analytics]
2023-05-26 §
06:42 <elukey> `apt-get clean` on stat1008 to clean up some space in the root partition [analytics]
06:36 <elukey> `truncate /var/log/kerberos/krb5kdc.log -s 10g` on krb1001 to avoid the root partition to fill up [analytics]
2023-05-25 §
13:42 <joal> rerun webrequest-refine job for 2023-05-20T00 - we're missing data [analytics]
12:31 <elukey> set "loadByPeriod(P3D+future), dropForever" for webrequest_sampled_live in druid-analytics - T337460 [analytics]
08:37 <joal> rerun druid_load_webrequest_sampled_128_daily 2023-05-20 to reload missing hour (T337088) [analytics]
08:37 <joal> rerun druid_load_webrequest_sampled_128_daily [analytics]
2023-05-24 §
16:19 <aqu> Deployed refinery using scap, then deployed onto hdfs [analytics]