Streaming Model Design, Comparison and Curation

Abstract

The growing volumes of available digital data coupled with machine learning and simulation models have substantially increased our ability to derive insights and predictions. However, there are important obstacles to the widespread and effective use of models: data science expertise is required to create the models, and even for data scientists, it is challenging to understand, validate, and build trust on their results. The DARPA Data-Driven Discovery of Models (D3M) program had as its goal to ``develop automated model discovery systems'' which both empower domain experts who do not have data science training to construct empirical models that to go from raw data to accurate predictions, and to make expert data scientists more productive via automation. In our project, we have developed methods and tools that support automatic machine learning. We have addressed three key problem areas: 1) the automatic and efficient synthesis of end-to-end machine learning (ML) pipelines for a wide range of tasks; 2) enabling domain experts to specify tasks as well as explore and assess the derived pipelines; 3) support for data discovery to augment training data and improve models.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2024
Accession Number
AD1219367

Entities

People

  • AĆ©cio Santos
  • Claudio Silva
  • Juliana Freire
  • Roque Lopez

Organizations

  • New York University

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Instructional Design and Training Evaluation.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks