951-1000 of 5587 results (18ms)
2022-08-18 §
09:51 <btullis> restarted monitor-refine-event on an-launcher1002 [analytics]
2022-08-17 §
13:19 <mforns> deployed airflow for https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/117 [analytics]
2022-08-16 §
18:49 <ottomata> complete refinery deploy that was unfinished from last week. an-launcher1002 and hdfs already have this version (6e47e0e712528c8816b7fd7456b8745e4dbc5c72) deployed. [analytics]
16:02 <btullis> deploying airflow-dags [analytics]
2022-08-15 §
19:26 <ottomata> test [analytics]
2022-08-10 §
18:04 <ottomata> Deployed refinery using scap, then deployed onto hdfs [analytics]
17:03 <ottomata> stopping puppet and drop data timers on an-launcher1002 and an-test-coord1001 to deploy drop script changes - T270433 [analytics]
13:42 <btullis> failed hive back to an-coord1001 via DNS change. [analytics]
11:47 <btullis> btullis@an-coord1001:~$ sudo systemctl restart hive-server2.service hive-metastore.service [analytics]
2022-08-08 §
11:43 <btullis> rebooting an-worker1102 due to kernel soft lockups [analytics]
2022-08-05 §
16:05 <milimetric> force scap deploying refinery [analytics]
16:01 <ottomata> removing airflow logs older than 7 days on an-launcher1002 [analytics]
2022-08-04 §
18:31 <ottomata> dropping medawiki_web_ui_interactions hive tables and data - T314151 [analytics]
18:19 <milimetric> scap deploying refinery host by host after Ben cleaned up the repos with "git checkout master" [analytics]
18:11 <btullis> btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -l stat1008.eqiad.wmnet "Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)]" [analytics]
18:05 <btullis> we are re-deploying refinery to an-launcher1002 with the command above [analytics]
18:04 <btullis> btullis@deploy1002:/srv/deployment/analytics/refinery$ scap deploy -l an-launcher1002.eqiad.wmnet "Regular analytics weekly train [analytics/refinery@$(git rev-parse --short HEAD)]" [analytics]
18:02 <btullis> analytics-deploy@an-launcher1002:/srv/deployment/analytics/refinery$ git checkout master [analytics]
15:59 <SandraEbele> Deploying analytics refinery using scap. [analytics]
2022-08-02 §
12:54 <btullis> sudo systemctl reset-failed on stat1008 to remove failed debmonitor alerts [analytics]
2022-07-28 §
20:05 <SandraEbele> killing Oozie projectview-hourly and projectview-geo jobs to deploy corresponding jobs on airflow. [analytics]
2022-07-24 §
21:10 <btullis> swapping disks on archiva1002 [analytics]
20:36 <btullis> rebooting archiva1002 to pick up new disk [analytics]
15:36 <btullis> btullis@ganeti1027:~$ sudo gnt-instance modify --disk add:size=200g archiva1002.wikimedia.org [analytics]
2022-07-22 §
21:19 <ottomata> restarted airflow-scheduler@platform_eng on an-airflow1003 for marco and cormac [analytics]
2022-07-19 §
10:05 <elukey> reboot an-worker1127 - hdfs datanode caused CPU stalls [analytics]
2022-07-13 §
14:19 <aqu> Deployed refinery using scap, then deployed onto hdfs (prod + test) [analytics]
06:16 <aqu> analytics/refinery deployment [analytics]
2022-07-07 §
13:38 <btullis> restart refine_eventlogging_legacy_test.service on an-test-coord1001 [analytics]
09:56 <btullis> restarted oozie on an-test-coord1001 [analytics]
09:23 <btullis> rebooted dbstore1007 [analytics]
09:21 <btullis> rebooted dbstore1005 [analytics]
09:02 <btullis> restarting dbstore1003 as per announced maintenance window [analytics]
2022-07-06 §
18:09 <ottomata> enabling iceberg hive catalog connector on analytics_cluster presto [analytics]
17:57 <ottomata> upgrading presto to 0.273.3 in analytics cluster - T311525 [analytics]
09:50 <btullis> roll-restarting hadoop workers on the test cluster. [analytics]
09:46 <btullis> restarting refinery-drop-webrequest-raw-partitions.service on an-test-coord1001 [analytics]
09:44 <btullis> restarting refinery-drop-webrequest-refined-partitions.service on an-test-coord1001 [analytics]
09:42 <btullis> restarted drop_event.service on an-test-coord1001 [analytics]
09:35 <btullis> restarting hive-server2 and hive-metastore on an-test-coord1001 [analytics]
2022-07-05 §
11:01 <btullis> sudo cookbook sre.hadoop.roll-restart-masters test [analytics]
2022-07-04 §
16:14 <btullis> systemctl restart airflow-scheduler@research.service (on an-airflow1002) [analytics]
08:04 <elukey> kill leftover processes of user `mewoph` on stat100x to allow puppet runs [analytics]
2022-06-29 §
17:27 <mforns> killed mediawiki-history-load bundle in Hue, and started corresponding mediawiki_history_load DAG in Airflow [analytics]
13:12 <mforns> re-deployed refinery with scap and refinery-deploy-to-hdfs [analytics]
11:51 <btullis> btullis@an-master1001:~$ sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
2022-06-28 §
20:57 <mforns> refinery deploy failed and I rolled back successfully, will try and repeat tomorrow when other people are present :] [analytics]
20:19 <mforns> starting refinery deployment for refinery-source v0.2.2 [analytics]
20:19 <mforns> starting refinery deploymenty [analytics]
17:25 <ottomata> installing presto 0.273.3 on an-test-coord1001 and an-test-presto1001 [analytics]