301-350 of 4719 results (12ms)
2022-03-16 §
18:00 <razzi> sudo cookbook sre.hosts.downtime -D 3 -r 'Setting up karapace for the first time' karapace1001.eqiad.wmnet [analytics]
17:57 <btullis> restarted mediawiki-history-drop-snapshot service on an-launcher1002 [analytics]
16:03 <aqu> analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow" [analytics]
10:26 <btullis> rerunning failed mediawiki_structured_task_article_link_suggestion_interaction refnie job [analytics]
2022-03-15 §
22:16 <razzi> upload karapace_2.1.3-py3.7-1_amd64.deb to apt.wikimedia.org [analytics]
19:58 <razzi> upload karapace_2.1.3-py3.7-0_amd64.deb to apt.wikimedia.org [analytics]
17:24 <ottomata> also change stats uid and gid to 918 on an-web1001 - T291384 [analytics]
14:35 <ottomata> change stats uid and gid on all stat boxes to 918 - T291384 [analytics]
13:59 <ottomata> roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - T303324 [analytics]
2022-03-14 §
21:05 <razzi> `sudo kill -9 15674` to stop unresponsive hive query [analytics]
2022-03-09 §
21:05 <ottomata> fix group ownership of cchen.db/new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/cchen.db/new_editors/cohort=2021-12 [analytics]
18:33 <ottomata> fix group ownership of wmf_product.db//new_editors/cohort=2021-12 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/new_editors/cohort=2021-12 [analytics]
18:32 <ottomata> fix group ownership of wmf_product.db/global_markets_pageviews/year=2022/month=2 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/global_markets_pageviews/year=2022/month=2 [analytics]
18:19 <btullis> btullis@ganeti1024:~$ sudo gnt-instance start karapace1001.eqiad.wmnet (T301562) [analytics]
16:16 <ottomata> fix group ownership of wmf_product.db/poageviews_corrected/year=222/month=2 after reverting T291664 - sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /user/hive/warehouse/wmf_product.db/pageviews_corrected/year=2022/month=2 [analytics]
2022-03-08 §
13:31 <ottomata> restarted webrequest-load oozie bundle as 0073173-220113112502223-oozie-oozi-B starting at 2022-03-08T12:00Z [analytics]
13:09 <ottomata> killing and rerunning webrequest-load-text-wf for webrequest_source=text/year=2022/month=3/day=7/hour=17, it was stuck in add_partition task as SUSPENDED, not sure why. [analytics]
12:47 <btullis> roll-restarting druid-analytics T300626 [analytics]
12:08 <btullis> roll-restarting druid-public. T300626 [analytics]
11:21 <btullis> roll-restarting druid-test T300626 [analytics]
11:00 <btullis> roll-restarting aqs T300626 [analytics]
10:57 <btullis> restarted archiva T300626 [analytics]
2022-03-07 §
19:14 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/*/hourly/year=2022/month=3/day=7 to make sure perms are fixed after revert of T291664 [analytics]
19:13 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/virtualpageview/hourly/year=2022/month=3/day=7 - revert of T291664 [analytics]
18:45 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/mediacounts/year=2022/month=3/day=7 [analytics]
18:37 <ottomata> sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chgrp -R analytics-privatedata-users /wmf/data/wmf/webrequest/webrequest_source=text/year=2022/month=3/day=7 - after reverting - T291664 [analytics]
18:34 <ottomata> restarting hive-server2 on an-coord1001 to revert hive.warehouse.subdir.inherit.perms change - T291664 [analytics]
14:44 <btullis> failing back hive services to an-coord1001 [analytics]
13:09 <aqu_> About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics]
12:45 <aqu_> About to deploy airflow-dags/analytics - Migrates wikidata/item_page_link [analytics]
12:10 <btullis> restarted hive-server2 process on an-coord1001 [analytics]
11:52 <btullis> obtaining heap dump: `hive@an-coord1001:/srv/hive-tmp$ jmap -dump:format=b,file=hive_server2_heap_T303168.bin 16971` [analytics]
11:51 <btullis> obtaining summary of heap objects and sizes: `hive@an-coord1001:/srv/hive-tmp$ jmap -histo:live 16971 > hive-object-storage-and-sizes.T303168.txt` [analytics]
11:38 <btullis> failing over hive to an-coord1001 T303168 [analytics]
2022-03-05 §
10:03 <elukey> restart hadoop-yarn-nodemanager on an-worker1132 (unhealthy node, reason Linux Container Executor reached unrecoverable exception) [analytics]
2022-03-04 §
17:46 <mforns> deployed Airflow to analytics instance to fix skein logs problem [analytics]
15:50 <mforns> deployed airflow in an-test-client1001 to test skein log fix [analytics]
05:19 <milimetric> rerunning monthly edit hourly druid oozie coordinator [analytics]
2022-03-03 §
17:48 <ottomata> roll restart aqs to pick up new MW history snapshot [analytics]
2022-03-01 §
18:38 <SandraEbele> sandra testing [analytics]
18:34 <razzi> demo irc logging to data eng team members [analytics]
10:19 <btullis> btullis@an-coord1002:/srv$ sudo rm -rf an-coord1001-backup/ (#T302777) [analytics]
09:48 <elukey> elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) [analytics]
2022-02-28 §
16:00 <milimetric> refinery done deploying and syncing, new sqoop list is up [analytics]
15:01 <milimetric> deploying new wikis to sqoop list ahead of sqoop job starting in a few hours [analytics]
2022-02-25 §
17:00 <milimetric> rerunning webrequest-load-wf-text-2022-2-25-15 after confirming all false positive loss [analytics]
2022-02-23 §
23:00 <razzi> sudo maintain-views --table flaggedrevs --databases fiwiki on clouddb1014.eqiad.wmnet and clouddb1018.eqiad.wmnet for T302233 [analytics]
2022-02-22 §
10:37 <btullis> re-enabled puppet on an-launcher1002, having absented the network_internal druid load job [analytics]
09:30 <aqu> Deploying analytics/refinery on hadoop-test only. [analytics]
07:38 <elukey> systemctl reset-failed mediawiki-history-drop-snapshot on an-launcher1002 (opened since a week ago) [analytics]