1001-1050 of 5200 results (18ms)
2021-11-18 §
17:26 <elukey> varnishkafka-webrequest on cp3050 is running with /etc/ssl/localcerts/wmf_trusted_root_CAs.pem [analytics]
10:03 <elukey> restart prometheus-druid-exporter on Druid Analytics to clear unnecessary metrics [analytics]
07:32 <elukey> restart prometheus-druid-exporter on Druid Public to see metrics difference [analytics]
2021-11-17 §
16:01 <btullis> roll-restarting kafka-test brokers [analytics]
12:12 <btullis> roll-restarting the presto analytics workers [analytics]
11:44 <btullis> btullis@archiva1002:~$ sudo systemctl restart archiva.service [analytics]
07:29 <elukey> `apt-get clean` on an-tool1005 to free space in the root partition [analytics]
07:28 <elukey> `sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user [analytics]
2021-11-16 §
19:40 <joal> Deploying refinery to HDFS [analytics]
19:15 <joal> Deploying refinery with scap [analytics]
18:23 <joal> Releasing refinery-source v0.1.21 [analytics]
11:32 <btullis> btullis@cumin1001:~$ sudo cookbook sre.druid.roll-restart-workers public [analytics]
10:20 <btullis> roll-restarting hadoop masters [analytics]
2021-11-15 §
16:37 <joal> Rerun failed mediawiki-wikitext-history-wf-2021-10 [analytics]
2021-11-11 §
06:56 <elukey> `systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108 [analytics]
2021-11-10 §
18:20 <btullis> btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service [analytics]
10:19 <btullis> btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed [analytics]
2021-11-09 §
16:52 <razzi> restart presto server on an-coord1001 to apply change for T292087 [analytics]
16:30 <razzi> set superset presto version to 0.246 in ui [analytics]
16:30 <razzi> set superset presto timeout to 170s: {"connect_args":{"session_props":{"query_max_run_time":"170s"}}} for T294771 [analytics]
12:23 <btullis> btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed [analytics]
07:23 <elukey> `apt-get clean` on stat1006 to free some space (root partition full) [analytics]
2021-11-08 §
19:51 <ottomata> an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150 [analytics]
19:44 <razzi> create admin user on an-coord1001 for T284150 [analytics]
18:07 <razzi> run `create user 'admin'@'localhost' identified by <password>; grant all privileges on *.* to admin;` to allow milimetric to access mysql on an-coord1002 for T284150 [analytics]
2021-11-04 §
16:39 <razzi> add "can sql json on superset" permission to Alpha role on superset.wikimedia.org [analytics]
16:14 <razzi> drop and restore superset_staging database to test permissions as they are in production [analytics]
2021-11-03 §
17:07 <razzi> razzi@an-tool1010:~$ sudo systemctl stop superset [analytics]
16:57 <razzi> dump mysql in preparation for superset upgrade [analytics]
02:23 <milimetric> deployed refinery with regular train [analytics]
2021-10-29 §
23:04 <btullis> deleted all remaining old cassandra snapshots on aqs100x servers. [analytics]
22:58 <btullis> deleted old snapshots from aqs1006 and aqs1009 [analytics]
17:45 <razzi> set presto_analytics_hive extra parameter engine_params.connect_args.session_props.query_max_run_time to 55s on superset.wikimedia.org [analytics]
10:39 <elukey> roll restart of kafka-test to pick up new truststore (root PKI added) [analytics]
2021-10-28 §
19:13 <ottomata> re-enable hdfs-cleaner for /wmf/gobblin [analytics]
2021-10-26 §
09:01 <btullis> reverted hive services back to an-coord1001. [analytics]
2021-10-25 §
16:03 <btullis> btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore [analytics]
13:02 <btullis> btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore [analytics]
12:51 <btullis> btullis@aqs1007:~$ sudo nodetool-a clearsnapshot [analytics]
2021-10-21 §
14:05 <ottomata> rerun refine_eventlogging_analytics refine_eventlogging_legacy and refine_event with -ignore-done-flag=true --since=2021-10-21T01:00:00 --until=2021-10-21T04:00:00 for backfill of missing data after gobblin problems [analytics]
13:39 <btullis> btullis@an-launcher1002:~$ sudo systemctl restart gobblin-event_default [analytics]
10:35 <joal> Re-refine netflow data after gobblin pulled data fix [analytics]
08:41 <joal> Rerun webrequest-load jobs for hour 2021-10-21T02:00 [analytics]
2021-10-20 §
18:11 <razzi> Deployed refinery using scap, then deployed onto hdfs [analytics]
16:36 <razzi> deploy refinery change for https://phabricator.wikimedia.org/T287084 [analytics]
07:15 <joal> rerun webrequest-load-wf-upload-2021-10-20-1 after node issue [analytics]
06:27 <elukey> reboot analytics1066 - OS showing CPU soft lockups, tons of defunct processes (including node manager) and high CPU usage [analytics]
2021-10-19 §
07:14 <joal> Rerun cassandra-daily-wf-local_group_default_T_mediarequest_top_files-2021-10-17 [analytics]
2021-10-18 §
19:29 <joal> Rerun cassandra-daily-wf-local_group_default_T_top_pageviews-2021-10-17 [analytics]
18:36 <joal> Rerun cassandra-daily-wf-local_group_default_T_unique_devices-2021-10-17 [analytics]