151-200 of 5005 results (12ms)
2022-12-19 §
13:45 <btullis> restart presto-server on an-coord1001 to increase heap from 4GB to 16 GB T325331 [analytics]
12:11 <aqu> systemctl start hadoop-namenode-backup-hdfs.service on an-master1002 at 11am UTC [analytics]
09:36 <aqu> Deployed analytics/refinery using scap, then deployed onto HDFS. [analytics]
09:17 <aqu> About to deploy analytics/refinery (bug fix in HDFS usage pipeline) [analytics]
2022-12-16 §
15:36 <xcollazo> deploying 'Fix subtle bug on image_suggestions when resolving varprop.' on platform_eng Airflow instance. [analytics]
2022-12-15 §
22:28 <btullis> run `sudo apt clean` on an-coord1001 [analytics]
19:08 <xcollazo> Deploying Spark3 upgrade of image_suggestions job to the platform_eng Airflow instance. [analytics]
10:03 <joal> Restart failed airflow tasks [analytics]
2022-12-13 §
21:35 <aqu> Deploying analytics/refinery (HDFS FSImage conversion to XML script) [analytics]
2022-12-09 §
08:38 <joal> Kill refine_eventlogging_legacy stuck job (application_1663082229270_510052) [analytics]
2022-12-08 §
13:55 <joal> rerun webrequest failed jobs for hour 2022-12-08-T11:00Z with updated workflow (no dataloss checks) [analytics]
12:23 <joal> rerun webrequest failed jobs for hour 2022-12-08-T11:00Z [analytics]
2022-12-07 §
17:57 <aqu> Adding raw hdfs fsimage dir in HDFS (an-launcher1002) [analytics]
17:47 <aqu> Adding hdfs/usage folder dataset in HDFS [analytics]
16:24 <aqu> Deploying analytics/refinery (HDFS usage scripts) [analytics]
15:13 <btullis> roll-restarting AQS to pick up new mediawiki_history_reduce snapshot [analytics]
14:06 <btullis> rebuilding an-tool1005 as bullseye to test superset 1.5.2 upgrade [analytics]
09:10 <btullis> reboot an-worker1108 as it was spinning with soft CPU lockups [analytics]
2022-12-06 §
12:47 <btullis> sudo systemctl restart wmf_auto_restart_prometheus-mysqld-exporter.service on matomo1002 [analytics]
11:53 <btullis> attempting to unmount and remount `/mnt/hdfs` on stat1004 [analytics]
2022-12-05 §
11:45 <steve_munene> restarting presto-server.service on an-presto1007 T323783 [analytics]
2022-11-30 §
16:44 <btullis> roll-restarting presto workers again for T321960 and T321231 [analytics]
16:20 <btullis> roll-restarting presto workers for T321960 and T321231 [analytics]
16:19 <btullis> restarting presto-server on an-coord1001 for T321960 and T321231 [analytics]
13:39 <btullis> pushing out conda-analytics to all remaining servers `btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-11-30-conda-analytics.yaml -Q P:analytics::conda_analytics` [analytics]
13:02 <btullis> deploying conda-analytics 0.0.12 to stat boxes for T321088 [analytics]
12:29 <btullis> repooling eqiad for eventstreams for T324074 [analytics]
11:59 <btullis> depooling eqiad for eventstreams for T324074 [analytics]
11:34 <btullis> repooling codfw for eventstreams for T324074 [analytics]
11:32 <btullis> destroying the eventstreams deployment in codfw and reapplying for T324074 [analytics]
11:11 <btullis> depooling codfw for eventstreams for T324074 [analytics]
2022-11-29 §
17:12 <ottomata> deploying refinery, then restarting druid webrequest daily and hourly loading oozie jobs [analytics]
17:08 <btullis> booted all of the an-worker nodes that had been switched off. [analytics]
15:04 <btullis> shutting down an-worker1093 [analytics]
15:03 <btullis> shutting down an-worker1089 [analytics]
15:02 <btullis> shutting down an-worker1085 [analytics]
15:00 <btullis> shutting down an-worker1083 [analytics]
14:58 <btullis> shutting down an-worker1079 [analytics]
14:55 <btullis> shutting down an-worker1090 [analytics]
2022-11-28 §
12:00 <btullis> restarted presto-server on an-coord1001 to test T321960 [analytics]
2022-11-25 §
15:29 <btullis> reset the bmc on an-coord1002 [analytics]
11:24 <elukey> restart turnilo on an-tool1007 to pick up new settings for webrequest_sampled_live [analytics]
10:07 <elukey> refresh the webrequest-sampled-live druid supervisor after https://gerrit.wikimedia.org/r/c/analytics/refinery/+/859463 [analytics]
2022-11-24 §
16:21 <SandraEbele> restarted webrequest-druid-daily-coord as part of weekly deployment train. [analytics]
16:15 <SandraEbele> killed webrequest-druid-daily-coord for restart as part of weekly deployment train. [analytics]
16:13 <SandraEbele> successfully restarted webrequest-druid-hourly-coord for restart as part of weekly deployment train. [analytics]
16:11 <SandraEbele> killed webrequest-druid-hourly-coord for restart as part of weekly deployment train. [analytics]
15:30 <SandraEbele> Started deployment of refinery as part of weekly deployment train [analytics]
2022-11-23 §
15:38 <btullis> roll-restarting kafka-jumbo brokers to pick up new certificates. T323697 [analytics]
2022-11-18 §
18:56 <mforns> re-ran refine_event_sanitized_analytics_immediate from 2022-11-17T13 to 2022-11-18T18 to fix the issues caused by a bug (allow-list typo) deployed yesterday. [analytics]