701-750 of 3651 results (19ms)
2020-09-15
§
|
12:30 |
<elukey> |
stop timers on an-launcher1002 to drain the cluster and restart an-coord1001's daemons (hive/oozie/presto) |
[analytics] |
06:48 |
<elukey> |
run systemctl reset-failed monitor_refine_eventlogging_legacy_failure_flags.service on an-launcher1002 |
[analytics] |
2020-09-14
§
|
14:36 |
<milimetric> |
deployed eventstreams with new KafkaSSE version on staging, eqiad, codfw |
[analytics] |
2020-09-11
§
|
15:41 |
<milimetric> |
restarted data quality stats bundles |
[analytics] |
01:32 |
<milimetric> |
deployed small fix for hql of editors_bycountry load job |
[analytics] |
00:46 |
<milimetric> |
deployed refinery source 0.0.136, refinery, and synced to HDFS |
[analytics] |
2020-09-09
§
|
10:11 |
<klausman> |
Rebooting stat1005 for clearing GPU status and testing new DKMS driver (T260442) |
[analytics] |
07:25 |
<elukey> |
restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage |
[analytics] |
2020-09-04
§
|
18:11 |
<milimetric> |
aqs deploy went well! Geoeditors endpoint is live internally, data load job was successful, will submit pull request for public endpoint. |
[analytics] |
06:54 |
<joal> |
Manually restart mediawiki-history-drop-snapshot after hive-partitions/hdfs-folders mismatch fix |
[analytics] |
06:08 |
<elukey> |
reset-failed mediawiki-history-drop-snapshot on an-launcher1002 to clear icinga errors |
[analytics] |
01:52 |
<milimetric> |
aborted aqs deploy due to cassandra error |
[analytics] |
2020-09-03
§
|
19:15 |
<milimetric> |
finished deploying refinery and refinery-source, restarting jobs now |
[analytics] |
13:59 |
<milimetric> |
edit-hourly-druid-wf-2020-08 fails consistently |
[analytics] |
13:56 |
<joal> |
Kill-restart mediawiki-history-reduced oozie job into production queue |
[analytics] |
13:56 |
<joal> |
rerun edit-hourly-druid-wf-2020-08 after failed attempt |
[analytics] |
2020-09-02
§
|
18:24 |
<milimetric> |
restarting mediawiki history denormalize coordinator in production queue, due to failed 2020-08 run |
[analytics] |
08:37 |
<elukey> |
run kafka preferred-replica-election on jumbo after jumbo1003's reimage to buster |
[analytics] |
2020-08-31
§
|
13:43 |
<elukey> |
run kafka preferred-replica-election on Jumbo after jumbo1001's reimage |
[analytics] |
07:13 |
<elukey> |
run kafka preferred-replica-election on Jumbo after jumbo1005's reimage |
[analytics] |
2020-08-28
§
|
14:25 |
<mforns> |
deployed pageview whitelist with new wiki: ja.wikivoyage |
[analytics] |
14:18 |
<elukey> |
run kafka preferred-replica-election on jumbo after the reimage of jumbo1006 |
[analytics] |
07:21 |
<joal> |
Manually add ja.wikivoyage to pageview allowlist to prevent alerts |
[analytics] |
2020-08-27
§
|
19:05 |
<mforns> |
finished refinery deploy (ref v0.0.134) |
[analytics] |
18:41 |
<mforns> |
starting refinery deploy (ref v0.0.134) |
[analytics] |
18:30 |
<mforns> |
deployed refinery-source v0.0.134 |
[analytics] |
13:29 |
<elukey> |
restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries) |
[analytics] |
2020-08-25
§
|
15:47 |
<elukey> |
restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication) |
[analytics] |
06:35 |
<elukey> |
restart mediawiki-history-drop-snapshot on an-launcher1002 to check that it works |
[analytics] |
2020-08-24
§
|
06:50 |
<joal> |
Dropping wikitext-history snapshots 2020-04 and 2020-05 keeping two (2020-06 and 2020-07) to free space in hdfs |
[analytics] |
2020-08-23
§
|
19:34 |
<nuria> |
deleted 1.2 TB from hdfs://analytics-hadoop/user/analytics/.Trash/200811000000 |
[analytics] |
19:31 |
<nuria> |
deleted 1.2 TB from hdfs://analytics-hadoop/user/nuria/.Trash/* |
[analytics] |
19:26 |
<nuria> |
deleted 300G from hdfs://analytics-hadoop/user/analytics/.Trash/200814000000 |
[analytics] |
19:25 |
<nuria> |
deleted 1.2 TB from hdfs://analytics-hadoop/user/analytics/.Trash/200808000000 |
[analytics] |
2020-08-20
§
|
16:49 |
<joal> |
Kill restart webrequest-load bundle to move it to production queue |
[analytics] |
2020-08-14
§
|
09:13 |
<fdans> |
restarting refine to apply T257860 |
[analytics] |
2020-08-13
§
|
16:13 |
<fdans> |
restarting webrequest bundle |
[analytics] |
14:44 |
<fdans> |
deploying refinery |
[analytics] |
14:13 |
<fdans> |
updating refinery source symlinks |
[analytics] |
2020-08-11
§
|
17:36 |
<ottomata> |
refine with refinery-source 0.0.132 and merge_with_hive_schema_before_read=true - T255818 |
[analytics] |
14:52 |
<ottomata> |
scap deploy refinery to an-launcher1002 to get camus wrapper script changes |
[analytics] |
2020-08-06
§
|
14:47 |
<fdans> |
deploying refinery |
[analytics] |
08:07 |
<elukey> |
roll restart druid-brokers (on both clusters) to pick up new changes for monitorings |
[analytics] |
2020-08-05
§
|
13:04 |
<elukey> |
restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529 |
[analytics] |
13:03 |
<elukey> |
set yarn_scheduler_minimum_allocation_mb = 1 (was zero) to Hadoop to workaround a Flink 1.1 issue (namely it doesn't work if the value is <= 0) |
[analytics] |
09:32 |
<elukey> |
set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default) |
[analytics] |
2020-08-04
§
|
08:30 |
<elukey> |
resume druid-related oozie coordinator jobs via Hue (after druid upgrade) |
[analytics] |
08:28 |
<elukey> |
started netflow kafka supervisor on Druid Analytics (after upgrade) |
[analytics] |
08:19 |
<elukey> |
restore systemd timers for druid jobs on an-launcher1002 (after druid upgrade) |
[analytics] |
07:33 |
<elukey> |
stop systemd timers related to druid on an-launcher1002 |
[analytics] |