301-350 of 4702 results (12ms)
2022-03-08 §
12:47 <btullis> roll-restarting druid-analytics T300626 [analytics]
12:08 <btullis> roll-restarting druid-public. T300626 [analytics]
11:21 <btullis> roll-restarting druid-test T300626 [analytics]
11:00 <btullis> roll-restarting aqs T300626 [analytics]
10:57 <btullis> restarted archiva T300626 [analytics]
2022-03-07 §
19:14 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/*/hourly/year=2022/month=3/day=7 to make sure perms are fixed after revert of T291664 [analytics]
19:13 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/virtualpageview/hourly/year=2022/month=3/day=7 - revert of T291664 [analytics]
18:45 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/mediacounts/year=2022/month=3/day=7 [analytics]
18:37 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/webrequest/webrequest_source=text/year=2022/month=3/day=7 - after reverting - T291664 [analytics]
18:34 <ottomata> restarting hive-server2 on an-coord1001 to revert hive.warehouse.subdir.inherit.perms change - T291664 [analytics]
14:44 <btullis> failing back hive services to an-coord1001 [analytics]
13:09 <aqu_> About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics]
12:45 <aqu_> About to deploy airflow-dags/analytics - Migrates wikidata/item_page_link [analytics]
12:10 <btullis> restarted hive-server2 process on an-coord1001 [analytics]
11:52 <btullis> obtaining heap dump: `hive@an-coord1001:/srv/hive-tmp$ jmap -dump:format=b,file=hive_server2_heap_T303168.bin 16971` [analytics]
11:51 <btullis> obtaining summary of heap objects and sizes: `hive@an-coord1001:/srv/hive-tmp$ jmap -histo:live 16971 > hive-object-storage-and-sizes.T303168.txt` [analytics]
11:38 <btullis> failing over hive to an-coord1001 T303168 [analytics]
2022-03-05 §
10:03 <elukey> restart hadoop-yarn-nodemanager on an-worker1132 (unhealthy node, reason Linux Container Executor reached unrecoverable exception) [analytics]
2022-03-04 §
17:46 <mforns> deployed Airflow to analytics instance to fix skein logs problem [analytics]
15:50 <mforns> deployed airflow in an-test-client1001 to test skein log fix [analytics]
05:19 <milimetric> rerunning monthly edit hourly druid oozie coordinator [analytics]
2022-03-03 §
17:48 <ottomata> roll restart aqs to pick up new MW history snapshot [analytics]
2022-03-01 §
18:38 <SandraEbele> sandra testing [analytics]
18:34 <razzi> demo irc logging to data eng team members [analytics]
10:19 <btullis> btullis@an-coord1002:/srv$ sudo rm -rf an-coord1001-backup/ (#T302777) [analytics]
09:48 <elukey> elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) [analytics]
2022-02-28 §
16:00 <milimetric> refinery done deploying and syncing, new sqoop list is up [analytics]
15:01 <milimetric> deploying new wikis to sqoop list ahead of sqoop job starting in a few hours [analytics]
2022-02-25 §
17:00 <milimetric> rerunning webrequest-load-wf-text-2022-2-25-15 after confirming all false positive loss [analytics]
2022-02-23 §
23:00 <razzi> sudo maintain-views --table flaggedrevs --databases fiwiki on clouddb1014.eqiad.wmnet and clouddb1018.eqiad.wmnet for T302233 [analytics]
2022-02-22 §
10:37 <btullis> re-enabled puppet on an-launcher1002, having absented the network_internal druid load job [analytics]
09:30 <aqu> Deploying analytics/refinery on hadoop-test only. [analytics]
07:38 <elukey> systemctl reset-failed mediawiki-history-drop-snapshot on an-launcher1002 (opened since a week ago) [analytics]
07:30 <elukey> kill remaining processes of rhuang-ctr on stat1004 and an-test-client1001 (user offboarded, but still holding jupyter notebooks etc..). Puppet was broken trying to remove the user. [analytics]
2022-02-21 §
17:55 <elukey> kill remaining processes of rhuang-ctr on various stat nodes (user offboarded, but still holding jupyter notebooks etc..). Puppet was broken trying to remove the user. [analytics]
16:58 <mforns> Deployed refinery using scap, then deployed onto hdfs (aqs hourly airflow queries) [analytics]
2022-02-19 §
12:21 <elukey> stop puppet on an-launcher1002, stop timers for eventlogging_to_druid_network_flows_internal_{hourly,daily} since no data is coming to the Kafka topic (expected due to some work for the Marseille DC) and it keeps alarming [analytics]
2022-02-17 §
16:18 <mforns> deployed wikistats2 [analytics]
2022-02-16 §
14:13 <mforns> deployed airflow-dags to analytics instance [analytics]
2022-02-15 §
17:20 <ottomata> split anaconda-wmf into 2 packages: anaconda-wmf-base and anaconda-wmf. anaconda-wmf-base is installed on workers, anaconda-wmf on clients. The size of the package on workers is now much smaller. Installing throught the cluster. relevant: T292699 [analytics]
2022-02-14 §
17:38 <razzi> razzi@an-test-client1001:~$ sudo systemctl reset-failed airflow-scheduler@analytics-test.service [analytics]
16:08 <razzi> sudo cookbook sre.ganeti.makevm --vcpus 4 --memory 8 --disk 50 eqiad_B datahubsearch1002 for T301383 [analytics]
2022-02-12 §
08:50 <elukey> truncate /var/log/auth.log to 1g on krb1001 to free space on root partition (original log saved under /srv) [analytics]
2022-02-11 §
15:06 <ottomata> set hive.warehouse.subdir.inherit.perms = false - T291664 [analytics]
2022-02-10 §
18:54 <ottomata> setting up research airflow-dags scap deployment, recreating airflow database and starting from scractch (fab okayed this) - T295380 [analytics]
16:48 <ottomata> deploying airflow analytics with lots of recent changes to airflow-dags repository [analytics]
2022-02-09 §
17:41 <joal> Deploy refinery onto HDFS [analytics]
17:05 <joal> Deploying refinery with scap [analytics]
16:39 <joal> Release refinery-source v0.1.25 to archiva [analytics]
2022-02-08 §
07:27 <elukey> restart hadoop-yarn-nodemanager on an-worker1115 (container executor reached unrecoverable exception, doesn't talk with the Yarn RM anymore) [analytics]