351-400 of 4204 results (17ms)
2021-05-25 §
18:16 <razzi> sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002 [analytics]
18:14 <razzi> sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service [analytics]
18:01 <razzi> manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
17:52 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002 [analytics]
17:28 <razzi> sudo systemctl restart refine_eventlogging_legacy [analytics]
17:28 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again [analytics]
17:07 <razzi> re-enabled puppet on an-masters and an-launcher [analytics]
17:04 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
17:03 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
16:43 <razzi> sudo systemctl restart hadoop-hdfs-namenode on an-master1001 [analytics]
16:38 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
16:35 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
16:28 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
16:23 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
16:06 <razzi> sudo systemctl restart hadoop-hdfs-namenode [analytics]
15:52 <razzi> checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
15:51 <razzi> enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
15:36 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again [analytics]
15:35 <razzi> re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
15:32 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet [analytics]
14:39 <razzi> stop puppet on an-launcher and stop hadoop-related timers [analytics]
01:09 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
01:07 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1001-eqiad-wmnet an-master1002-eqiad-wmnet [analytics]
00:34 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1001-eqiad-wmnet an-master1002-eqiad-wmnet [analytics]
2021-05-24 §
18:05 <ottomata> resume failing cassandra 3 oozie loading jobs, they are also loading to cassandra 2: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [analytics]
18:04 <ottomata> suspend failing cassandra 3 oozie loading jobs: cassandra-daily-coord-local_group_default_T_top_percountry (0011318-210426062240701-oozie-oozi-C), cassandra-daily-coord-local_group_default_T_unique_devices (0011324-210426062240701-oozie-oozi-C) [analytics]
15:19 <ottomata> rm -rf /tmp/analytics/* on an-launcher1002 - T283126 [analytics]
2021-05-20 §
06:05 <elukey> kill christinedk's jupyter process on stat1007 (offboarded user) to allow puppet to run [analytics]
2021-05-19 §
16:31 <razzi> restart turnilo for T279380 [analytics]
2021-05-18 §
20:22 <razzi> restart oozie virtualpageview hourly, virtualpageview druid daily, virtualpageview druid monthly [analytics]
18:57 <razzi> deployed refinery via scap, then deployed to hdfs [analytics]
18:46 <ottomata> removing extraneous python-kafka and python-confluent-kafka deb packages from analytics cluster - T275786 [analytics]
12:40 <joal> Add monitoring data in cassandra-3 [analytics]
06:50 <joal> run manual unique-devices cassandra job for one day with debug logging [analytics]
02:20 <ottomata> manually running drop_event with --verbose flag [analytics]
2021-05-17 §
11:09 <joal> Restart cassandra-daily-wf-local_group_default_T_unique_devices-2021-5-4 for testing after host generating failures has been moved out of cluster [analytics]
10:41 <joal> Restart cassandra-daily-wf-local_group_default_T_unique_devices-2021-5-4 for testing after drop/create of keyspace [analytics]
10:28 <joal> Restart cassandra-daily-wf-local_group_default_T_unique_devices-2021-5-4 for testing [analytics]
09:45 <joal> Rerun of cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2021-5-15 [analytics]
2021-05-13 §
11:41 <hnowlan> running truncate "local_group_default_T_pageviews_per_article_flat".data; on aqs1012 [analytics]
2021-05-12 §
15:17 <ottomata> dropped event.mediawiki_job_* tables and data directories with mforns - T273789 [analytics]
13:56 <ottomata> removing refine_mediawiki_job Refine jobs - T281605 [analytics]
2021-05-11 §
21:00 <mforns> finished repeated refinery deployment (matching source v0.1.11) - missed unmerged change [analytics]
19:59 <mforns> repeating refinery deployment (matching source v0.1.11) - missed unmerged change [analytics]
19:53 <mforns> finished refinery deployment (matching source v0.1.11) [analytics]
18:41 <mforns> starting refinery deployment (matching source v0.1.11) [analytics]
17:26 <mforns> deployed refinery-source v0.1.11 [analytics]
2021-05-06 §
21:27 <razzi> sudo manage_principals.py reset-password nahidunlimited --email_address=nsultan@wikimedia.org [analytics]
13:29 <elukey> roll restart of hadoop yarn nodemanagers to pick up TasksMax=26214 [analytics]
12:39 <elukey> restart Yarn RMs to apply the dominant resource calculator setting - T281792 [analytics]