3301-3350 of 5850 results (31ms)
2020-04-02 §
08:23 <elukey> kill/restart netflow realtime druid indexation with a new dimension (peer_ip_src) - T246186 [analytics]
2020-04-01 §
21:19 <joal> restart pageview-hourly-wf-2020-4-1-15 [analytics]
18:24 <joal> Kill learning-features-actor-hourly as new version to come [analytics]
18:23 <joal> Restart unique_devices-per_project_family-monthly-wf-2020-3 and aqs-hourly-wf-2020-4-1-15 after hive fialure [analytics]
18:21 <joal> restart webrequest-load-wf-upload-2020-4-1-16 and webrequest-load-wf-text-2020-4-1-16 after hive failure [analytics]
18:14 <joal> Kill groceryheist job taking half the cluster [analytics]
18:06 <ottomata> restarted hive-server2 [analytics]
10:07 <jbond42> updating icu packages [analytics]
2020-03-31 §
12:57 <jbond42> updating icu on presto-analytics-canary and hadoop-worker-canary [analytics]
2020-03-30 §
07:27 <elukey> run /usr/local/bin/refine_sanitize_eventlogging_analytics_immediate --ignore_failure_flag=true --since=72 --verbose --table_whitelist_regex="ResourceTiming" refine_sanitize_eventlogging_analytics_immediate to fix _REFINE_FAILED events [analytics]
07:16 <elukey> run eventlogging refine manually for schemas "EditorActivation|EditorJourney|HomepageVisit|VisualEditorFeatureUse|WikibaseTermboxInteraction|UploadWizardErrorFlowEvent|MobileWikiAppiOSReadingLists|ContentTranslationCTA|QuickSurveysResponses|MobileWikiAppiOSSessions to fix _REFINE_FAILED events [analytics]
2020-03-29 §
08:44 <elukey> blacklist TwoColConflictExit from Eventlogging Refine to avoid alarm spam [analytics]
2020-03-28 §
16:54 <elukey> restart yarn nodemanger on analytics1071 - network errors in the logs [analytics]
2020-03-27 §
08:09 <elukey> deployed new kernerls for https://gerrit.wikimedia.org/r/580083 on stat1004 [analytics]
2020-03-26 §
09:09 <elukey> re-running manually webrequest-load upload 26/03/2020T08 - kerberos failures [analytics]
2020-03-25 §
08:14 <elukey> restart presto-server on an-coord1001 to remove jmx catalog config [analytics]
2020-03-24 §
15:46 <elukey> restart all cron.service processes on stat/notebook (killing long lingering processes) to move the unit under user.slice [analytics]
2020-03-21 §
14:17 <joal> Restart wikidata_item_page_link job with manual fix - review to be confirmed [analytics]
14:06 <joal> Kill buggy wikidata_item_page_link job [analytics]
2020-03-18 §
19:39 <fdans> refinery deployed [analytics]
18:52 <fdans> deploying refinery [analytics]
18:51 <fdans> refinery source 0.0.119 jars generated and symlinked [analytics]
18:17 <fdans> beginning deploy of refinery-source 0.0.119 [analytics]
2020-03-17 §
17:25 <elukey> deploy superset to enable Presto and Kerberos (Pyhive 0.6.2.) [analytics]
2020-03-16 §
19:43 <joal> Kill-restart wikidata-articleplaceholder_metrics-coord to fix yarn queue [analytics]
18:30 <mforns> Deployed refinery using scap, then deployed onto hdfs [analytics]
17:05 <elukey> roll restart of hadoop namenodes to get the new GC setting (MaxGCPauseMillis 400 -> 1000) [analytics]
2020-03-13 §
12:18 <joal> Restart cassandra-daily-wf-local_group_default_T_pageviews_per_article_flat-2020-3-12 [analytics]
2020-03-12 §
22:53 <mforns> Deployed refinery using scap, then deployed onto hdfs [analytics]
22:22 <mforns> deployed refinery-source using jenkins [analytics]
11:09 <elukey> roll restart kerberos kdcs to pick up new ticket lifetime settings (10h -> 48h) [analytics]
08:27 <elukey> re-running refine eventlogging with --since 12 (very conservative but just in case) [analytics]
2020-03-11 §
14:49 <elukey> add xmldumps mountpoints on stat1004 and stat1005 [analytics]
2020-03-10 §
15:20 <elukey> remove the analytics user keytab from stat100[4,5] [analytics]
15:06 <elukey> move stat1006 to role::statistics::explorer [analytics]
09:24 <elukey> removed /etc/mysql/conf.d/stats-research-client.cnf from all stat boxes (all file used for RU, now on an-launcher1001) [analytics]
2020-03-09 §
07:27 <elukey> deploy jupyterhub on notebook100[3,4] (manual venv re-creation) to allow the use of the user.slice - T247055 [analytics]
07:26 <elukey> upgrade nodejs from 6->10 on stat1* and notebook1* [analytics]
2020-03-08 §
17:58 <elukey> restart hadoop-yarn-nodemanger on an-worker1087 [analytics]
2020-03-06 §
14:58 <joal> AQS new druid snapshot released (2020-02) [analytics]
10:06 <elukey> roll restart Presto daemons for openjdk upgrades [analytics]
09:45 <elukey> roll restart of cassandra on AQS to pick up new openjdk upgrades [analytics]
2020-03-05 §
19:45 <elukey> deleted dangling 'reports' symlink on stat100[6,7] in /srv/published [analytics]
19:39 <elukey> mv /srv/reportupdater to /srv/reportupdater-backup05032020 on stat100[6,7] [analytics]
16:34 <mforns> restart turnilo to refresh deleted datasources [analytics]
14:16 <elukey> restart hdfs/yarn master daemons to pick up new core-site changes for Superset [analytics]
06:48 <elukey> restart yarn on analytics1074 (GC overhead, traces of network errors with datanodes) [analytics]
2020-03-04 §
08:41 <joal> Kill-restart mediawiki-history-reduced-coord [analytics]
08:38 <joal> Kill-restart mediawiki-history-dumps-coord [analytics]
2020-03-03 §
21:19 <joal> Kill-restart actor jobs [analytics]