1151-1200 of 5027 results (28ms)
2021-06-10 §
16:25 <razzi> rolling restart hadoop masters to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/698194 [analytics]
14:07 <ottomata> altered event.wmdebannerevent event.eventRate field to change type from BIGINT to DOUBLE - T282562 [analytics]
2021-06-08 §
16:56 <elukey> move away from dbstore1004 in favor of dbstore1007 in analytics CNAME/SRV records (will affect analytics-mysql and sqoop) [analytics]
13:42 <ottomata> roll restart an-conf zookeepers - T283067 [analytics]
13:22 <ottomata> roll restarting analytics presto-servers - T283067 [analytics]
06:08 <elukey> restart yarn nodemanager on analytics1075 to clear the un-healthy state after some days of downtime (one-off issue but let's keep an eye on it) [analytics]
2021-06-07 §
18:14 <ottomata> rolling restart of kafka jumbo brokers - T283067 [analytics]
17:53 <ottomata> rolling restart of kafka jumbo mirror makers - T283067 [analytics]
17:07 <ottomata> remove packages from an clsuter nodes: sudo apt-get -y remove r-cran-rmysql python3-matplotlib python3-sklearn python3-enchant python3-nltk gfortran liblapack-dev libopenblas-dev - T275786 [analytics]
16:50 <ottomata> restarting mysqld analytics-meta replica on db1108 to apply config change - T272973 [analytics]
2021-06-04 §
17:42 <razzi> sudo cookbook sre.aqs.roll-restart aqs to deploy new mediawiki history snapshot [analytics]
2021-06-03 §
22:32 <razzi> sudo manage_principals.py create jdl --email_address=jlinehan@wikimedia.org [analytics]
22:32 <razzi> sudo manage_principals.py create phuedx --email_address=phuedx@wikimedia.org [analytics]
15:46 <ottomata> add airflow_2.1.0-py3.7-1_amd64.deb to apt.wm.org [analytics]
15:20 <ottomata> created airflow_analytics database and user on an-coord1001 analytics-meta instance - T272973 [analytics]
2021-06-02 §
18:09 <ottomata> remove .deb packages from stat boxes: python3-mysqldb python3-boto python3-ua-parser python3-netaddr python3-pymysql python3-protobuf python3-unidecode python3-oauth2client python3-oauthlib python3-requests-oauthlib python3-ua-parser - T275786 [analytics]
2021-05-31 §
06:56 <joal> Rerun cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-5-29 [analytics]
2021-05-27 §
14:37 <elukey> removed Luca's and Tobias' emails from analytics-alerts@ [analytics]
07:01 <elukey> roll restart hdfs namenodes to pick up new GC/heap settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/695933 [analytics]
2021-05-26 §
19:14 <ottomata> deploying refinery and refinery source 0.1.13 [analytics]
17:29 <ottomata> killing and restarting oozie cassandra loader jobs coord_unique_devices_daily and coord_pageview_top_percountry_daily after revert of oozie job to load to cassandra 3 [analytics]
14:18 <ottomata> deploying refinery... [analytics]
14:17 <ottomata> Deployed refinery-source using jenkins [analytics]
2021-05-25 §
18:16 <razzi> sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002 [analytics]
18:14 <razzi> sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service [analytics]
18:01 <razzi> manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
17:52 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002 [analytics]
17:28 <razzi> sudo systemctl restart refine_eventlogging_legacy [analytics]
17:28 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again [analytics]
17:07 <razzi> re-enabled puppet on an-masters and an-launcher [analytics]
17:04 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
17:03 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
16:43 <razzi> sudo systemctl restart hadoop-hdfs-namenode on an-master1001 [analytics]
16:38 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
16:35 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
16:28 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
16:23 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
16:06 <razzi> sudo systemctl restart hadoop-hdfs-namenode [analytics]
15:52 <razzi> checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
15:51 <razzi> enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
15:36 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again [analytics]
15:35 <razzi> re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
15:32 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet [analytics]
14:39 <razzi> stop puppet on an-launcher and stop hadoop-related timers [analytics]
01:09 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
01:07 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1001-eqiad-wmnet an-master1002-eqiad-wmnet [analytics]
00:34 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1001-eqiad-wmnet an-master1002-eqiad-wmnet [analytics]
2021-05-24 §
18:05 <ottomata> resume failing cassandra 3 oozie loading jobs, they are also loading to cassandra 2: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [analytics]
18:04 <ottomata> suspend failing cassandra 3 oozie loading jobs: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [analytics]
15:19 <ottomata> rm -rf /tmp/analytics/* on an-launcher1002 - T283126 [analytics]