Eve: A Virtual Data Scientist (D3M/Eve)

Abstract

Subject matter experts (SMEs) attempting to solve real-world analytic problems face several challenges due to the lack of applied mathematics, statistics, and machine learning skills that data scientists possess. The goal of our TA2 effort under the DARPA D3M Program was to span this gap by using novel methods and automation to enable SMEs to act as their own data scientists. Our effort was designed to fuse data- and knowledge-driven approaches to produce a virtual data scientist we call Eve. To translate domain-expert intent into formal representations of learning problems, we built a problem representation system that deterministically converts TA3 inputs into computer-interpretable mathematical expressions. To efficiently search for and compose the sequences of machine learning steps that comprise learning plans, we built a Monte Carlo Discrepancy Search approach that explores the vast space of possible plans through efficient modification and testing of prior related and/or successful plans. Further, we enriched these plans by incorporating data preparation models, treating data preprocessing functions as operators to be planned in-line with learning operators.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2019
Accession Number
AD1069846

Entities

People

  • Alan Fern
  • Avi Pfeffer
  • Brad R Rosenberg
  • Erich Merrill
  • Jonathan Almeida
  • Jonathan Y. Hsu
  • Josh Serrin
  • Mukesh Dalal
  • Nina Zumel

Tags

Communities of Interest

  • Autonomy
  • Cyber

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Applied Mathematics
  • Artificial Intelligence
  • Bayesian Networks
  • Cognitive Systems Engineering
  • Data Mining
  • Data Science
  • Dimensionality Reduction
  • Human-Machine Interfaces
  • Information Science
  • Information Systems
  • Machine Learning
  • Mathematics
  • Statistics
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • Space