Cost Models for Query Processing Strategies in the Active Data Repository

Abstract

Exploring and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We have been developing the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations. In this work we describe analytical models to predict the average computation, I/O and communication operation counts of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular d-dimensional array. We validate these models for various synthetic datasets and for several driving applications.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 13, 1999
Accession Number
AD1005541

Entities

People

  • Chialin Chang

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Space

DTIC Thesaurus Topics

  • Accumulators
  • Algorithms
  • Bandwidth
  • Boundaries
  • Computations
  • Computer Science
  • Cost Models
  • Costs
  • Data Processing
  • Equations
  • Hilbert Curve
  • Hilbert Space
  • Military Research
  • Polar Orbits
  • Probability
  • Scientific Research
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Distributed Systems and Data Platform Development

Technology Areas

  • Space