Time-Bound Analytic Tasks on Large Data Sets Through Dynamic Configuration of Workflows

Abstract

Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow systems is that they do not take into account user deadlines: they run workflows selected by the user, but take their time to do so. This is impractical when large datasets are at stake, since users often prefer to see an answer faster even if it has lower precision or quality. In this paper, we present an extension to workflow systems that enables them to take into account user deadlines by automatically generating alternative workflow candidates and ranking them according to performance estimates. The system makes these estimates based on workflow performance models created from workflow executions, and uses semantic technologies to reason about workflow options. Possible workflow candidates are presented to the user in a compact manner, and are ranked according to their runtime estimates. We have implemented this approach in the WOOT system, which combines and extends capabilities from the WINGS semantic workflow system and the Apache OODT Object Oriented Data Technology and workflow execution system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2013
Accession Number
ADA596013

Entities

People

  • Andrew Hart
  • Arni Sumarlidason
  • Chris Mattmann
  • Paul Ramirez
  • Rishi Verma
  • Samuel L. Park
  • Varun Ratnakar
  • Yolanda Gil

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Biomedical
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Big Data
  • Computers
  • Data Analysis
  • Data Management
  • Data Mining
  • Data Science
  • Engineering
  • Information Science
  • Jet Propulsion
  • Language
  • Natural Language Processing
  • Natural Languages
  • Predictive Modeling
  • Social Media
  • Software Development

Fields of Study

  • Computer science
  • Engineering

Readers

  • Distributed Systems and Data Platform Development
  • Instructional Design and Training Evaluation.