201-250 of 3491 results (12ms)
2021-01-08 §
08:44 <elukey> force restart of monitor_refine_eventlogging_legacy_failure_flags.service [analytics]
08:18 <elukey> raise default max executor heap size for Spark refine to 4G [analytics]
2021-01-07 §
18:22 <elukey> chown -R /tmp/analytics analytics:analytics-privatedata-users (tmp dir for data quality stats tables) [analytics]
18:21 <elukey> "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics:analytics-privatedata-users /wmf/data/wmf/data_quality_stats" [analytics]
18:10 <elukey> disable temporarily hdfs-cleaner.timer to prevent /tmp/DataFrameToDruid to be dropped [analytics]
18:08 <elukey> chown -R /tmp/DataFrameToDruid analytics:druid (was: analytics:hdfs) on hdfs to temporarily unblock Hive2Druid jobs [analytics]
16:31 <elukey> remove /etc/mysql/conf.d/research-client.cnf from stat100x nodes [analytics]
15:40 <elukey> deprecate the 'reseachers' posix group for good [analytics]
11:24 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event_sanitized" to fix some file permissions as well [analytics]
10:36 <elukey> execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chmod -R o-rwx /wmf/data/event" on an-master1001 to fix some file permissions (an-launcher executed timers during the past hours without the new umask) - T270629 [analytics]
09:37 <elukey> forced re-run of monitor_refine_event_failure_flags.service on an-launcher1002 to clear alerts [analytics]
08:26 <joal> Rerunning 4 failed refine jobs (mediawiki_cirrussearch_request, day=6/hour=20|21, day=7/hour=0|2) [analytics]
08:14 <elukey> re-enable puppet on an-launcher1002 to apply new refine memory settings [analytics]
07:59 <elukey> re-enabling all oozie jobs previously suspended [analytics]
07:54 <elukey> restart oozie on an-coord1001 [analytics]
2021-01-06 §
20:42 <ottomata> starting remaining refine systemd timers [analytics]
20:19 <ottomata> restarted eventlogging_to_druid timers [analytics]
20:19 <ottomata> restarted drop systemd timers [analytics]
20:18 <ottomata> restarted reportupdater timers [analytics]
20:14 <ottomata> re-starting camus systemd timers [analytics]
16:45 <razzi> restart yarn nodemanagers [analytics]
16:08 <razzi> manually failover hdfs haadmin from an-master1002 to an-master1001 [analytics]
15:53 <ottomata> stopping analytics systemd timers on an-launcher1002 [analytics]
2021-01-05 §
21:32 <ottomata> bumped mediawiki history snapshot version in AQS [analytics]
20:45 <ottomata> Refine changes: event tables now have is_wmf_domain, canary events are removed, and corrupt records will result in a better monitoring email [analytics]
20:43 <razzi> deploy aqs as part of train [analytics]
19:17 <razzi> deploying refinery for weekly train [analytics]
09:29 <joal> Manually reload unique-devices monthly in cassandra to fix T271170 [analytics]
2021-01-04 §
22:22 <razzi> reboot an-test-coord1001 to upgrade kernel [analytics]
14:24 <elukey> deprecate the analytics-users group [analytics]
2021-01-03 §
14:11 <milimetric> reset-failed refinery-sqoop-whole-mediawiki.service [analytics]
14:10 <milimetric> manual sqoop finished, logs on an-launcher1002 at /var/log/refinery/sqoop-mediawiki.log and /var/log/refinery/sqoop-mediawiki-production.log [analytics]
2021-01-01 §
14:54 <milimetric> deployed refinery hotfix for sqoop problem, after testing on three small wikis [analytics]
2020-12-29 §
09:18 <elukey> restart hue to pick up analytics-hive endpoint settings [analytics]
2020-12-23 §
15:53 <ottomata> point analytics-hive.eqiad.wmnet back at an-coord1001 - T268028 T270768 [analytics]
2020-12-22 §
19:35 <elukey> restart hive daemons on an-coord1001 to pick up new settings [analytics]
18:13 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1002 (to allow maintenance on an-coord1001) [analytics]
18:07 <elukey> restart hive server on an-coord1002 (current standby - no traffic) to pick up the new config (use the local metastore as opposed to what it is pointed by analytics-hive) [analytics]
17:00 <mforns> Deployed refinery as part of weekly train (v0.0.142) [analytics]
16:42 <mforns> Deployed refinery-source v0.0.142 [analytics]
16:30 <mforns> Deployed refinery-source v0.0.142 [analytics]
15:00 <razzi> stopping superset server on analytics-tool1004 [analytics]
10:36 <elukey> restart presto coordinator to pick up analytics-hive settings [analytics]
10:25 <elukey> failover analytics-hive.eqiad.wmnet to an-coord1001 [analytics]
09:56 <elukey> restart hive daemons on an-coord1001 to pick up analytics-hive settings [analytics]
07:27 <elukey> reboot stat100[4-8] (analytics hadoop clients) for kernel upgrades [analytics]
07:23 <elukey> move all analytics clients (spark refine, stat100x, hive-site.xml on hdfs, etc..) to analytics-hive.eqiad.wmnet [analytics]
2020-12-18 §
14:10 <elukey> restore stat1004 to its previous settings for kerberos credential cache [analytics]
2020-12-17 §
14:54 <klausman> Updated all stat100x machines to now sport kafkacat 1.6.0, backported from Bullseye [analytics]
11:04 <elukey> wipe/reimage the hadoop test cluster to start clean for CDH (and then test the upgrade to bigtop 1.5) [analytics]