101-150 of 3161 results (15ms)
2020-10-11 §
08:24 <elukey> decommission analytics1046 from the hadoop cluster [analytics]
08:12 <elukey> clean up logs on an-launcher1002 (disk space full) [analytics]
2020-10-10 §
12:01 <elukey> decommission analytics1045 from the Hadoop cluster [analytics]
2020-10-09 §
13:17 <elukey> execute "cumin 'stat100[5,8]* or an-worker109[6-9]* or an-worker110[0,1]*' 'apt-get install -y linux-headers-amd64'" [analytics]
11:15 <elukey> bootstrap the Analytics Hadoop test cluster [analytics]
09:47 <elukey> roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings [analytics]
07:58 <elukey> decom analytics1044 from Hadoop [analytics]
07:04 <elukey> failover from an-master1002 to 1001 for HDFS namenode (the namenode failed over hours ago, no logs to check) [analytics]
2020-10-08 §
18:08 <razzi> restart oozie server on an-coord1001 for reverting T262660 [analytics]
17:42 <razzi> restart oozie server on an-coord1001 for T262660 [analytics]
17:19 <elukey> removed /var/lib/puppet/clientbucket/6/f/a/c/d/9/8/d/6facd98d16886787ab9656eef07d631e/content on an-launcher1002 (29G, last modified Aug 4th) [analytics]
15:45 <elukey> executed git pull on /srv/jupyterhub/deploy and run again create_virtualenv.sh on stat1007 (pyspark kernels may not run correctly due to a missing feature) [analytics]
15:43 <elukey> executed git pull on /srv/jupyterhub/deploy and run again create_virtualenv.sh on stat1006 (pyspark kernels not running due to a missing feature) [analytics]
13:13 <elukey> roll restart of druid overlords and coordinators on druid public to pick up new TLS settings [analytics]
12:51 <elukey> roll restart of druid overlords and coordinators on druid analytics to pick up new TLS settings [analytics]
10:35 <elukey> force the re-creation of default jupyterhub venvs on stat1006 after reimage [analytics]
08:47 <klausman> Starting re-image of stat1006 to Buster [analytics]
07:14 <elukey> decom analytics1043 from the Hadoop cluster [analytics]
06:46 <elukey> move the hdfs balancer from an-coord1001 to an-launcher1002 [analytics]
2020-10-07 §
08:45 <elukey> decom analytics1042 from hadoop [analytics]
2020-10-06 §
13:14 <elukey> cleaned up /srv/jupyter/venv and re-created it to allow jupyterhub to start cleanly on stat1007 [analytics]
12:56 <joal> Restart oozie to pick up new spark settings [analytics]
12:47 <elukey> force re-creation of the base virtualenv for jupyter on stat1007 after the reimage [analytics]
12:20 <elukey> update HDFS Namenode GC/Heap settings on an-master100[1,2] [analytics]
12:19 <elukey> increase spark shuffle io retry logic (10 tries every 10s) [analytics]
09:08 <elukey> add an-worker1114 to the hadoop cluster [analytics]
09:04 <klausman> Starting reimaging of stat1007 [analytics]
07:32 <elukey> bootstrap an-worker111[13] as hadoop workers [analytics]
2020-10-05 §
19:14 <mforns> restarted oozie coord unique_devices-per_domain-monthly after deployment [analytics]
19:05 <mforns> finished deploying refinery to unblock deletion of raw mediawiki_job and raw netflow data [analytics]
18:45 <mforns> deploying refinery to unblock deletion of raw mediawiki_job and raw netflow data [analytics]
18:20 <elukey> manual creation of /opt/rocm -> /opt/rocm-3.3.0 on stat1008 to avoid failures in finding the lib dir [analytics]
17:11 <elukey> bootstrap an-worker[1115-1117] as hadoop workers [analytics]
14:52 <milimetric> disabling drop-el-unsanitized-events timer until https://gerrit.wikimedia.org/r/c/analytics/refinery/+/631804/ is deployed [analytics]
14:41 <elukey> shutdown stat1005 and stat1008 for ram expansion (1005 again) [analytics]
14:25 <elukey> shutdown an-master1001 for ram expansion [analytics]
13:54 <elukey> shutdown stat1005 for ram upgrade [analytics]
13:31 <elukey> shutdown an-master1002 for ram expansion (64 -> 128G) [analytics]
12:35 <elukey> execute "PURGE BINARY LOGS BEFORE '2020-09-28 00:00:00';" on an-coord1001's mysql to free space - T264081 [analytics]
10:31 <elukey> bootstrap an-worker111[0,2] as hadoop workers [analytics]
10:31 <elukey> bootstrap an-worker111[0,2 [analytics]
06:33 <elukey> reboot stat1005 to resolve weird GPU state (scheduled last week) [analytics]
2020-10-03 §
10:35 <joal> Manually run mediawiki-history-denormalize after fail-rerun problem (second time) [analytics]
2020-10-02 §
16:43 <joal> Rerun mediawiki-history-denormalize-wf-2020-09 after failed instance [analytics]
14:23 <elukey> live patch refinery-drop-older-than on stat1007 to unblock timer (patch https://gerrit.wikimedia.org/r/6317800) [analytics]
13:00 <elukey> add an-worker110[6-9] to the Hadoop cluster [analytics]
06:49 <elukey> add an-worker110[0-2] to the hadoop cluster [analytics]
06:33 <joal> Manually sqoop page_props and user_properties to unlock mediawiki-history-load oozie job [analytics]
2020-10-01 §
19:07 <fdans> deploying wikistats [analytics]
19:06 <fdans> restarted banner_activity-druid-daily-coord from Sep 26 [analytics]