401-450 of 4657 results (24ms)
2022-01-04 §
10:39 <elukey> restart cassandra-a on aqs1010 (heap size used in full, high GC) [analytics]
10:20 <elukey> restart cassandra-a on aqs1015 (heap size used in full, high GC) [analytics]
2022-01-03 §
18:26 <joal> rerun cassandra-daily-wf-local_group_default_T_mediarequest_per_file-2022-1-1 [analytics]
16:08 <joal> Kill cassandra3-local_group_default_T_mediarequest_per_file-daily-2022-1-1 [analytics]
11:26 <elukey> restart cassandra-b on aqs1015 (instance not responding, probably trashing) [analytics]
11:16 <elukey> restart cassandra-b on aqs1010 (stuck trashing) [analytics]
10:34 <elukey> depool aqs1010 (`sudo -i depool` on the node) to allow investigation of the cassandra -b instance [analytics]
10:22 <elukey> powercycle an-worker1114 (CPU soft lockup errors in mgmt console) [analytics]
10:20 <elukey> powercycle an-worker1120 (CPU soft lockup errors in mgmt console) [analytics]
2021-12-22 §
19:13 <milimetric> Additional context on the last delete message: on an-launcher1002 which is filled up [analytics]
19:12 <milimetric> Marcel and I are deleting files from /tmp older than 60 days [analytics]
15:55 <mforns> finished refinery deployment for anomaly detection queries [analytics]
14:54 <mforns> starting refinery deployment for anomaly detection queries [analytics]
2021-12-20 §
18:59 <mforns> finished deployment of refinery, adding anomaly detection hql for airflow job [analytics]
18:39 <mforns> started to deploy refinery, adding anomaly detection hql for airflow job [analytics]
2021-12-17 §
12:32 <btullis> Upgraded druid packages, with pool/depool on druid1004 [analytics]
11:20 <btullis> btullis@an-test-druid1001:~$ sudo apt-get install druid-broker druid-common druid-coordinator druid-historical druid-middlemanager druid-overlord [analytics]
11:18 <btullis> updating reprepo with new druid packages for buster-wikimedia to pick up new log4j jar files [analytics]
2021-12-16 §
11:01 <btullis> btullis@an-test-druid1001:~$ sudo apt-get install druid-broker druid-common druid-coordinator druid-historical druid-middlemanager druid-overlord [analytics]
11:01 <btullis> upgrading druid on the test cluster with new packages to test log4j changes. [analytics]
2021-12-15 §
08:51 <joal> Rerun failed cassandra-daily-wf-local_group_default_T_mediarequest_per_file-2021-12-13 after cluster restart [analytics]
07:20 <elukey> elukey@stat1007:~$ sudo systemctl reset-failed product-analytics-movement-metrics [analytics]
2021-12-14 §
19:02 <milimetric> finished deploying the weekly train as per etherpad [analytics]
18:04 <joal> Rerun failed cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-12-13 after cluster reboot [analytics]
17:51 <btullis> rebooting aqs1015 [analytics]
17:25 <btullis> rebooting aqs1013 [analytics]
17:19 <btullis> rebooting aqs1012 [analytics]
16:00 <btullis> rebooting aqs1011 [analytics]
15:53 <btullis> rebooting aqs1010 [analytics]
15:00 <btullis> btullis@aqs1010:~$ sudo nodetool-a repair --full system_auth [analytics]
14:59 <btullis> cassandra@cqlsh> ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': '12'}; on aqs1010-a [analytics]
14:25 <btullis> btullis@aqs1011:$ sudo systemctl start cassandra-b.service [analytics]
12:44 <joal> Rerun failed cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2021-12-14-10 [analytics]
12:42 <joal> Kill late spark cassandra loading job [analytics]
2021-12-11 §
10:06 <elukey> kill process 2560 on stat1005 to allow puppet to clean up the related user (offboarded) [analytics]
10:04 <elukey> kill process 2831 on stat1008 to allow puppet to clean up the related user (offboarded) [analytics]
2021-12-09 §
11:08 <btullis> roll restarting druid historical daemons on analytics cluster T297148 [analytics]
10:46 <btullis> roll restarting druid brokers on analytics cluster [analytics]
2021-12-07 §
20:09 <ottomata> deploy wikistats2 with doc updates [analytics]
2021-12-03 §
17:36 <razzi> restart aqs-next to pick up new mediawiki snapshot: `razzi@cumin1001:~$ sudo cumin A:aqs-next 'systemctl restart aqs'` [analytics]
17:36 <razzi> restart aqs to pick up new mediawiki snapshot: `razzi@cumin1001:~$ sudo cookbook sre.aqs.roll-restart aqs` [analytics]
07:33 <elukey> move kafka-test to fixed uid/gid [analytics]
2021-12-02 §
20:05 <ottomata> restarting pageview-druid-daily-coord (killing 0062888-210701181527401-oozie-oozi-C) - I can't seem to rerun a particular hour, so just starting again from that hour. [analytics]
17:57 <elukey> drop "EventLogging MySQL" datasource from Superset (not valid anymore) [analytics]
17:26 <joal> Kill paragon job to prevent more nodemangers to OOM [analytics]
2021-12-01 §
20:40 <razzi> deploy refinery for T296089 patch https://gerrit.wikimedia.org/r/c/analytics/refinery/+/742672 [analytics]
2021-11-27 §
09:56 <elukey> powercycle analytics1071, soft lockup stacktraces in the tty [analytics]
2021-11-24 §
17:30 <mforns> Deployed refinery using scap, then deployed onto hdfs [analytics]
12:31 <btullis> btullis@an-launcher1002:~$ sudo systemctl reset-failed monitor_refine_event_sanitized_analytics_delayed.service [analytics]
07:09 <elukey> drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-bebe254120f8 on stat1006 to free space on the root partition [analytics]