A Survey on Collecting, Managing, and Analyzing Provenance from Scripts

Abstract

Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jun 18, 2019
Source ID
10.1145/3311955

Entities

People

  • João Felipe Pimentel
  • Juliana Freire
  • Leonardo Murta
  • Vanessa Braganholo

Organizations

  • AT&T
  • Coordenação de Aperfeicoamento de Pessoal de Nível Superior
  • Defense Advanced Research Projects Agency
  • Fluminense Federal University
  • Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro
  • National Council for Scientific and Technological Development
  • National Science Foundation
  • New York University

Tags

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Systems Analysis and Design