151-200 of 4376 results (24ms)
2021-12-14
§
|
14:25 |
<btullis> |
btullis@aqs1011:$ sudo systemctl start cassandra-b.service |
[analytics] |
12:44 |
<joal> |
Rerun failed cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2021-12-14-10 |
[analytics] |
12:42 |
<joal> |
Kill late spark cassandra loading job |
[analytics] |
2021-12-11
§
|
10:06 |
<elukey> |
kill process 2560 on stat1005 to allow puppet to clean up the related user (offboarded) |
[analytics] |
10:04 |
<elukey> |
kill process 2831 on stat1008 to allow puppet to clean up the related user (offboarded) |
[analytics] |
2021-12-09
§
|
11:08 |
<btullis> |
roll restarting druid historical daemons on analytics cluster T297148 |
[analytics] |
10:46 |
<btullis> |
roll restarting druid brokers on analytics cluster |
[analytics] |
2021-12-07
§
|
20:09 |
<ottomata> |
deploy wikistats2 with doc updates |
[analytics] |
2021-12-03
§
|
17:36 |
<razzi> |
restart aqs-next to pick up new mediawiki snapshot: `razzi@cumin1001:~$ sudo cumin A:aqs-next 'systemctl restart aqs'` |
[analytics] |
17:36 |
<razzi> |
restart aqs to pick up new mediawiki snapshot: `razzi@cumin1001:~$ sudo cookbook sre.aqs.roll-restart aqs` |
[analytics] |
07:33 |
<elukey> |
move kafka-test to fixed uid/gid |
[analytics] |
2021-12-02
§
|
20:05 |
<ottomata> |
restarting pageview-druid-daily-coord (killing 0062888-210701181527401-oozie-oozi-C) - I can't seem to rerun a particular hour, so just starting again from that hour. |
[analytics] |
17:57 |
<elukey> |
drop "EventLogging MySQL" datasource from Superset (not valid anymore) |
[analytics] |
17:26 |
<joal> |
Kill paragon job to prevent more nodemangers to OOM |
[analytics] |
2021-12-01
§
|
20:40 |
<razzi> |
deploy refinery for T296089 patch https://gerrit.wikimedia.org/r/c/analytics/refinery/+/742672 |
[analytics] |
2021-11-27
§
|
09:56 |
<elukey> |
powercycle analytics1071, soft lockup stacktraces in the tty |
[analytics] |
2021-11-24
§
|
17:30 |
<mforns> |
Deployed refinery using scap, then deployed onto hdfs |
[analytics] |
12:31 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service |
[analytics] |
07:09 |
<elukey> |
drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8 on stat1006 to free space on the root partition |
[analytics] |
2021-11-23
§
|
11:56 |
<btullis> |
roll-restarting the cassandra services on the aqs cluster. (Not the aqs_next cluster) |
[analytics] |
11:49 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart presto-server.service |
[analytics] |
11:49 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart oozie.service |
[analytics] |
2021-11-22
§
|
12:18 |
<btullis> |
failed back the hive services to an-coord1001 via CNAME change |
[analytics] |
11:36 |
<btullis> |
btullis@an-coord1001:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
10:44 |
<btullis> |
deploying DNS change to switch hive to the standby server. |
[analytics] |
10:18 |
<btullis> |
btullis@an-coord1002:~$ sudo systemctl restart hive-server2 hive-metastore |
[analytics] |
2021-11-18
§
|
17:26 |
<elukey> |
varnishkafka-webrequest on cp3050 is running with /etc/ssl/localcerts/wmf_trusted_root_CAs.pem |
[analytics] |
10:03 |
<elukey> |
restart prometheus-druid-exporter on Druid Analytics to clear unnecessary metrics |
[analytics] |
07:32 |
<elukey> |
restart prometheus-druid-exporter on Druid Public to see metrics difference |
[analytics] |
2021-11-17
§
|
16:01 |
<btullis> |
roll-restarting kafka-test brokers |
[analytics] |
12:12 |
<btullis> |
roll-restarting the presto analytics workers |
[analytics] |
11:44 |
<btullis> |
btullis@archiva1002:~$ sudo systemctl restart archiva.service |
[analytics] |
07:29 |
<elukey> |
`apt-get clean` on an-tool1005 to free space in the root partition |
[analytics] |
07:28 |
<elukey> |
`sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user |
[analytics] |
2021-11-16
§
|
19:40 |
<joal> |
Deploying refinery to HDFS |
[analytics] |
19:15 |
<joal> |
Deploying refinery with scap |
[analytics] |
18:23 |
<joal> |
Releasing refinery-source v0.1.21 |
[analytics] |
11:32 |
<btullis> |
btullis@cumin1001:~$ sudo cookbook sre.druid.roll-restart-workers public |
[analytics] |
10:20 |
<btullis> |
roll-restarting hadoop masters |
[analytics] |
2021-11-15
§
|
16:37 |
<joal> |
Rerun failed mediawiki-wikitext-history-wf-2021-10 |
[analytics] |
2021-11-11
§
|
06:56 |
<elukey> |
`systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108 |
[analytics] |
2021-11-10
§
|
18:20 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service |
[analytics] |
10:19 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
2021-11-09
§
|
16:52 |
<razzi> |
restart presto server on an-coord1001 to apply change for T292087 |
[analytics] |
16:30 |
<razzi> |
set superset presto version to 0.246 in ui |
[analytics] |
16:30 |
<razzi> |
set superset presto timeout to 170s: {"connect_args":{"session_props":{"query_max_run_time":"170s"}}} for T294771 |
[analytics] |
12:23 |
<btullis> |
btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed |
[analytics] |
07:23 |
<elukey> |
`apt-get clean` on stat1006 to free some space (root partition full) |
[analytics] |
2021-11-08
§
|
19:51 |
<ottomata> |
an-coord1002: drop user 'admin'@'localhost'; start slave; to fix broken replication - T284150 |
[analytics] |
19:44 |
<razzi> |
create admin user on an-coord1001 for T284150 |
[analytics] |