1-50 of 3783 results (12ms)
2021-04-29 §
15:55 <razzi> restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl [analytics]
15:38 <ottomata> enabling event_sanitized_main jobs - T273789 [analytics]
14:57 <elukey> run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 [analytics]
14:44 <hnowlan> restored all eventlogging jobs to eventlog1003 [analytics]
14:21 <hnowlan> bump eventlog1003 CPUs to 6 [analytics]
13:53 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 [analytics]
13:09 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 [analytics]
12:35 <hnowlan> restarting 2 processors on eventlog1002 [analytics]
12:02 <hnowlan> stopping processors on eventlog1002 to migrate to eventlog1003 [analytics]
11:50 <elukey> manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over [analytics]
02:59 <milimetric> deployed hotfix for referrer job [analytics]
2021-04-28 §
17:46 <hnowlan> eventlog1003 joined to groups successfully [analytics]
17:36 <razzi> sudo mkdir /srv/log/eventlogging and sudo chown eventlogging:eventlogging /srv/log/eventlogging to workaround missing directory puppet error (to be puppetized later) [analytics]
17:31 <razzi> remove deployment cache on eventlogging1003: sudo rm -fr /srv/deployment/eventlogging/analytics-cache/ [analytics]
17:26 <razzi> manually change /srv/deployment/eventlogging/analytics/.git/DEPLOY_HEAD to deployment1002 on deployment1002 to fix puppet scap error [analytics]
16:53 <hnowlan> stopping deployment-eventlog05 in deployment-prep [analytics]
14:42 <milimetric> deployed refinery with 0.1.9 jars and synced to hdfs [analytics]
14:30 <elukey> chown -R analytics-deploy:analytics-deploy /srv/deployment/analytics on an-coord1001 [analytics]
12:50 <ottomata> applied data_purge jobs in analytics test cluster; old data will now be dropped there - T273789 [analytics]
2021-04-27 §
08:33 <elukey> run mysql_upgrade for analytics-meta on an-coord1002 (should be part of the upgrade process) - T278424 [analytics]
07:11 <elukey> restart yarn resource managers to pick up yarn label settings [analytics]
2021-04-26 §
08:01 <elukey> restart hadoop-mapreduce-historyserver on an-master1001 after changes to the yarn ui user [analytics]
07:36 <elukey> re-enable timers after setting the capacity scheduler [analytics]
07:31 <elukey> restart hadoop RM on an-master* to pick up capacity scheduler changes [analytics]
06:44 <elukey> stop timers on an-launcher1002 again as prep step for capacity scheduler changes [analytics]
06:32 <elukey> roll restart of hadoop-yarn-nodemanagers to pick up new log4j settings - T276906 [analytics]
06:25 <elukey> re-enable timers [analytics]
06:20 <elukey> reboot an-coord1001 to pick up kernel security settings [analytics]
05:57 <elukey> stop timers on an-launcher1002 to allow a reboot of an-coord1001 [analytics]
2021-04-24 §
08:03 <joal> Rerun failed webrequest-druid-hourly-wf-2021-4-23-13 [analytics]
2021-04-23 §
14:23 <elukey> roll restart an-master100[1,2] daemons to pick up new lo4j settings - T276906 [analytics]
10:30 <elukey> restart hadoop daemons (NM, DN, JN) on an-worker1080 to further test the new log4j config - T276906 [analytics]
09:12 <elukey> change default log4j hadoop config to include rolling gzip appender [analytics]
2021-04-21 §
21:30 <ottomata> temporariliy disabling sanitize_eventlogging_analytics_delayed jobs until T280813 is completed (probably tomorrow) [analytics]
20:04 <ottomata> renaming event_santized hive table directories to lower case and repairing table partition paths - T280813 [analytics]
09:28 <elukey> roll restart druid-overlord on druid* after an-coord1001 maintenance [analytics]
09:08 <elukey> upgrade hue on an-tool1009 to 4.9.0-2 [analytics]
08:31 <elukey> re-enable timers on an-launcher1002 and airflow on an-airflow1001 after maintenance on an-coord1001 [analytics]
07:08 <elukey> reimage an-coord1001 after partition reshape (/var/lib/mysql folded in /srv) [analytics]
06:51 <elukey> stop airflow on an-airflow1001 [analytics]
06:49 <elukey> stop all services on an-coord1001 as prep step for reimage [analytics]
06:45 <elukey> PURGE BINARY LOGS BEFORE '2021-04-14 00:00:00'; on an-coord1001 to free some space before the reimage [analytics]
06:00 <elukey> stop timers on an-launcher1002 as prep step for an-coord1001 reimage [analytics]
2021-04-20 §
15:51 <elukey> move analytics-hive.eqiad.wmnet back to an-coord1001 (test on an-coord1002 successful) [analytics]
15:38 <ottomata> deployed refiner to hdfs [analytics]
13:59 <ottomata> deploying refinery and refinery source 0.1.6 for weekly train [analytics]
13:37 <ottomata> deployed aqs [analytics]
13:16 <elukey> failover analytics-hive to an-coord1002 to test the host (running on buster) [analytics]
12:40 <elukey> PURGE BINARY LOGS BEFORE '2021-04-12 00:00:00'; on an-coord1001 - T280367 [analytics]
2021-04-19 §
16:45 <ottomata> make RefineMonitor use analytics keytab - this should be a no-op [analytics]