1401-1450 of 4905 results (32ms)
2021-02-23 §
15:23 <elukey> deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-tool100[8,9] [analytics]
15:22 <elukey> deploy new uid/gid scheme for yarn/mapred/analytics/hdfs/druid on an-airflow1001, an-test* buster nodes [analytics]
15:05 <klausman> an-master1001 ~ $ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp analytics-privatedata-users /wmf/data/raw/webrequest/webrequest_text/hourly/2021/02/22/01/webrequest* [analytics]
14:51 <elukey> drop /srv/backup-1007 on stat1008 to free space [analytics]
2021-02-22 §
19:27 <ottomata> restart oozie on an-coord1001 to pick up new spark share lib without hadoop jars - T274384 [analytics]
14:38 <ottomata> upgrade spark2 on analytics cluster to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) - T274384 [analytics]
14:12 <ottomata> upgrade spark2 on an-coord1001 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed), will remove and auto-re add spark-2.4.4-assembly.zip in hdfs after running puppet here [analytics]
14:07 <ottomata> upgrade spark2 on stat1004 to 2.4.4-bin-hadoop2.6-5~wmf0 (hadoop jars removed) [analytics]
09:01 <elukey> reboot stat1005/stat1008 for kernel upgrades [analytics]
2021-02-19 §
15:53 <elukey> restart oozie again to test another setting for role/admins [analytics]
15:43 <ottomata> installing spark 2.4.4 without hadoop jars on analytics test cluster - T274384 [analytics]
15:31 <elukey> restart oozie to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/665352 [analytics]
14:34 <joal> rerun mobile_apps-uniques-daily-wf-2021-2-18 [analytics]
09:16 <elukey> stop and decom the hadoop backup cluster [analytics]
2021-02-18 §
18:38 <razzi> rebalance kafka partition for webrequest_upload partition 1 [analytics]
17:27 <elukey> an-coord1002 back in service with raid1 configured [analytics]
15:48 <elukey> stop hive/mysql on an-coord1002 as precautionary step to rebuild the md array [analytics]
13:10 <elukey> failover analytics-hive to an-coord1001 after maintenance (DNS change) [analytics]
11:32 <elukey> restart hive daemons on an-coord1001 to pick up new parquet settings [analytics]
10:07 <elukey> hive failover to an-coord1002 to apply new hive settings to an-coord1001 [analytics]
10:00 <elukey> restart hive daemons on an-coord1002 (standby coord) to pick up new default parquet file format change [analytics]
09:46 <elukey> upgrade presto to 0.246-wmf on an-coord1001, an-presto*, stat100x [analytics]
2021-02-17 §
17:44 <razzi> rebalance kafka partitions for webrequest_upload partition 0 [analytics]
16:14 <razzi> rebalance kafka partitions for eqiad.mediawiki.api-request [analytics]
07:04 <elukey> reboot stat1004/stat1006/stat1007 for kernel upgrades [analytics]
2021-02-16 §
22:31 <razzi> rebalance kafka partitions for codfw.mediawiki.api-request [analytics]
17:44 <razzi> rebalance kafka partitions for netflow [analytics]
17:42 <razzi> rebalance kafka partitions for atskafka_test_webrequest_text [analytics]
07:32 <elukey> restart hadoop daemons on an-worker1099 after reconfiguring a new disk [analytics]
06:58 <elukey> restart hdfs/yarn daemons on an-worker1097 to exclude a failed disk [analytics]
2021-02-15 §
20:38 <mforns> running hdfs fsck to troubleshoot corrupt blocks [analytics]
17:28 <elukey> restart hdfs namenodes on the main cluster to pick up new racking changes (worker nodes from the backup cluster) [analytics]
2021-02-14 §
09:38 <joal> Restart and backfill mediacount and mediarequest, and backfill mediarequest-AQS and mediacount archive [analytics]
09:38 <joal> deploy refinery onto hdfs [analytics]
09:14 <joal> Deploy hotfix for mediarequest and mediacount [analytics]
2021-02-12 §
19:19 <milimetric> deployed refinery with query syntax fix for the last broken cassandra job and an updated EL whitelist [analytics]
18:34 <razzi> rebalance kafka partitions for atskafka_test_webrequest_text [analytics]
18:31 <razzi> rebalance kafka partitions for __consumer_offsets [analytics]
17:48 <joal> Rerun wikidata-articleplaceholder_metrics-wf-2021-2-10 [analytics]
17:47 <joal> Rerun wikidata-specialentitydata_metrics-wf-2021-2-10 [analytics]
17:43 <joal> Rerun wikidata-json_entity-weekly-wf-2021-02-01 [analytics]
17:08 <elukey> reboot presto workers for kernel upgrade [analytics]
16:32 <mforns> finished deployment of analytics-refinery [analytics]
15:26 <mforns> started deployment of analytics-refinery [analytics]
15:16 <elukey> roll restart druid broker on druid-public to pick up new settings [analytics]
07:54 <elukey> roll restart of druid brokers on druid-public - locked after scheduled datasource deletion [analytics]
07:46 <elukey> force a manual run of refinery-druid-drop-public-snapshots on an-launcher1002 (3d before its natural start) - controlled execution to see how druid + 3xdataset replication reacts [analytics]
2021-02-11 §
14:26 <joal> Restart oozie API job after spark sharelib fix (start: 2021-02-10T18:00) [analytics]
14:20 <joal> Rerun failed clicstream instance 2021-01 after sharelib fix [analytics]
14:16 <joal> Restart oozie after having fixed the spark-2.4.4 sharelib [analytics]