Eve: A Virtual Data Scientist (D3M/Eve)
Abstract
Subject matter experts (SMEs) attempting to solve real-world analytic problems face several challenges due to the lack of applied mathematics, statistics, and machine learning skills that data scientists possess. The goal of our TA2 effort under the DARPA D3M Program was to span this gap by using novel methods and automation to enable SMEs to act as their own data scientists. Our effort was designed to fuse data- and knowledge-driven approaches to produce a virtual data scientist we call Eve. To translate domain-expert intent into formal representations of learning problems, we built a problem representation system that deterministically converts TA3 inputs into computer-interpretable mathematical expressions. To efficiently search for and compose the sequences of machine learning steps that comprise learning plans, we built a Monte Carlo Discrepancy Search approach that explores the vast space of possible plans through efficient modification and testing of prior related and/or successful plans. Further, we enriched these plans by incorporating data preparation models, treating data preprocessing functions as operators to be planned in-line with learning operators.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 2019
- Accession Number
- AD1069846
Entities
People
- Alan Fern
- Avi Pfeffer
- Brad R Rosenberg
- Erich Merrill
- Jonathan Almeida
- Jonathan Y. Hsu
- Josh Serrin
- Mukesh Dalal
- Nina Zumel