Streaming Model Design, Comparison and Curation
Abstract
The growing volumes of available digital data coupled with machine learning and simulation models have substantially increased our ability to derive insights and predictions. However, there are important obstacles to the widespread and effective use of models: data science expertise is required to create the models, and even for data scientists, it is challenging to understand, validate, and build trust on their results. The DARPA Data-Driven Discovery of Models (D3M) program had as its goal to ``develop automated model discovery systems'' which both empower domain experts who do not have data science training to construct empirical models that to go from raw data to accurate predictions, and to make expert data scientists more productive via automation. In our project, we have developed methods and tools that support automatic machine learning. We have addressed three key problem areas: 1) the automatic and efficient synthesis of end-to-end machine learning (ML) pipelines for a wide range of tasks; 2) enabling domain experts to specify tasks as well as explore and assess the derived pipelines; 3) support for data discovery to augment training data and improve models.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2024
- Accession Number
- AD1219367
Entities
People
- AƩcio Santos
- Claudio Silva
- Juliana Freire
- Roque Lopez
Organizations
- New York University