External Memory Algorithms: Dealing With MASSIVE Data

Abstract

The bottleneck in many applications that process massive amounts of data is the I/O communications between internal memory and external memory. The bottleneck is accentuated as processors get faster and parallel processors are used. Parallel disk arrays are often used to increase the I/O bandwidth. The goal of this proposal is to deepen our understanding of the limits of I/O systems and to construct external memory algorithms that are provably efficient. The three measures of performance are number of I/Os, disk storage space, and CPU time. Even when the data fit entirely in memory, communication can still be the bottleneck, and the related issues of caching become important. Theoretical work involves development and analysis of provably efficient external memory algorithms and cache-efficient algorithms for a variety of important application areas. We address several batched and on-line problems, involving text databases, prefetching and streaming data from parallel disks, and database selectivity estimation. Our experimental validation uses our TPIE programming environment. Plans for the coming year are to address bottleneck issues in parallel disks, text databases, and XML databases.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 29, 2002
Accession Number
ADA415374

Entities

People

  • Jeffrey S. Vitter

Organizations

  • Duke University

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Bandwidth
  • Central Processing Units
  • Computer Graphics
  • Computer Programming
  • Computer Science
  • Computers
  • Databases
  • Earth Sciences
  • Environment
  • Networks
  • Parallel Processors
  • Scientists
  • Standards
  • Storage
  • Validation
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Parallel and Distributed Computing.
  • Systems Analysis and Design

Technology Areas

  • Space