701-750 of 4913 results (24ms)
2021-12-02
§
|
17:26 |
<joal> |
Kill paragon job to prevent more nodemangers to OOM |
[analytics] |
2021-12-01
§
|
20:40 |
<razzi> |
deploy refinery for T296089 patch https://gerrit.wikimedia.org/r/c/analytics/refinery/+/742672 |
[analytics] |
2021-11-27
§
|
09:56 |
<elukey> |
powercycle analytics1071, soft lockup stacktraces in the tty |
[analytics] |
2021-11-24
§
|
17:30 |
<mforns> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
12:31 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service |
[analytics] |
07:09 |
<elukey> |
drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8 on stat1006 to free space on the root partition |
[analytics] |
2021-11-23
§
|
11:56 |
<btullis> |
roll-restarting the cassandra services on the aqs cluster. (Not the aqs_next cluster) |
[analytics] |
11:49 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart presto-server.service |
[analytics] |
11:49 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart oozie.service |
[analytics] |
2021-11-22
§
|
12:18 |
<btullis> |
failed back the hive services to an-coord1001 via CNAME change |
[analytics] |
11:36 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
10:44 |
<btullis> |
deploying DNS change to switch hive to the standby server. |
[analytics] |
10:18 |
<btullis> |
btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
2021-11-18
§
|
17:26 |
<elukey> |
varnishkafka-webrequest on cp3050 is running with /etc/ssl/localcerts/wmf_trusted_root_CAs.pem |
[analytics] |
10:03 |
<elukey> |
restart prometheus-druid-exporter on Druid Analytics to clear unnecessary metrics |
[analytics] |
07:32 |
<elukey> |
restart prometheus-druid-exporter on Druid Public to see metrics difference |
[analytics] |
2021-11-17
§
|
16:01 |
<btullis> |
roll-restarting kafka-test brokers |
[analytics] |
12:12 |
<btullis> |
roll-restarting the presto analytics workers |
[analytics] |
11:44 |
<btullis> |
btullis@archiva1002:~$ sudo systemctl restart archiva.service |
[analytics] |
07:29 |
<elukey> |
`apt-get clean` on an-tool1005 to free space in the root partition |
[analytics] |
07:28 |
<elukey> |
`sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user |
[analytics] |
2021-11-16
§
|
19:40 |
<joal> |
Deploying refinery to HDFS |
[analytics] |
19:15 |
<joal> |
Deploying refinery with scap |
[analytics] |
18:23 |
<joal> |
Releasing refinery-source v0.1.21 |
[analytics] |
11:32 |
<btullis> |
btullis@cumin1001:~$ sudo cookbook sre.druid.roll-restart-workers public |
[analytics] |
10:20 |
<btullis> |
roll-restarting hadoop masters |
[analytics] |
2021-11-15
§
|
16:37 |
<joal> |
Rerun failed mediawiki-wikitext-history-wf-2021-10 |
[analytics] |
2021-11-11
§
|
06:56 |
<elukey> |
`systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108 |
[analytics] |
2021-11-10
§
|
18:20 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service |
[analytics] |
10:19 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
2021-11-09
§
|
16:52 |
<razzi> |
restart presto server on an-coord1001 to apply change for T292087 |
[analytics] |
16:30 |
<razzi> |
set superset presto version to 0.246 in ui |
[analytics] |
16:30 |
<razzi> |
set superset presto timeout to 170s: {"connect_args":{"session_props":{"query_max_run_time":"170s"}}} for T294771 |
[analytics] |
12:23 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
07:23 |
<elukey> |
`apt-get clean` on stat1006 to free some space (root partition full) |
[analytics] |
2021-11-08
§
|
19:51 |
<ottomata> |
an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150 |
[analytics] |
19:44 |
<razzi> |
create admin user on an-coord1001 for T284150 |
[analytics] |
18:07 |
<razzi> |
run `create user 'admin'@'localhost' identified by <password>; grant all privileges on *.* to admin;` to allow milimetric to access mysql on an-coord1002 for T284150 |
[analytics] |
2021-11-04
§
|
16:39 |
<razzi> |
add "can sql json on superset" permission to Alpha role on superset.wikimedia.org |
[analytics] |
16:14 |
<razzi> |
drop and restore superset_staging database to test permissions as they are in production |
[analytics] |
2021-11-03
§
|
17:07 |
<razzi> |
razzi@an-tool1010:~$ sudo systemctl stop superset |
[analytics] |
16:57 |
<razzi> |
dump mysql in preparation for superset upgrade |
[analytics] |
02:23 |
<milimetric> |
deployed refinery with regular train |
[analytics] |
2021-10-29
§
|
23:04 |
<btullis> |
deleted all remaining old cassandra snapshots on aqs100x servers. |
[analytics] |
22:58 |
<btullis> |
deleted old snapshots from aqs1006 and aqs1009 |
[analytics] |
17:45 |
<razzi> |
set presto_analytics_hive extra parameter engine_params.connect_args.session_props.query_max_run_time to 55s on superset.wikimedia.org |
[analytics] |
10:39 |
<elukey> |
roll restart of kafka-test to pick up new truststore (root PKI added) |
[analytics] |
2021-10-28
§
|
19:13 |
<ottomata> |
re-enable hdfs-cleaner for /wmf/gobblin |
[analytics] |
2021-10-26
§
|
09:01 |
<btullis> |
reverted hive services back to an-coord1001. |
[analytics] |
2021-10-25
§
|
16:03 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |