Automating Physical Database Design: An Extensible Approach

Abstract

In a high-level query language such as SQL, queries yield the same result no matter how the logical schema is physically implemented. Nevertheless, a query's cost can vary by orders of magnitude among different physical implementations of the same logical schema, even with the most modern query optimizers. Therefore, designing a low-cost physical implementation is an important pragmatic problem-one that requires a sophisticated understanding of physical design options and query strategies, and that involves estimating query costs, a tedious and error-prone process when done manually. We have devised a simple framework for automating physical design in relational or post-relational DBMSs and in database programming languages. Within this framework, design options are uniformly represented as "features", and designs are represented by "conflict"-free sets of features. (Mutually exclusive features conflict. An example would be two primary indexes on the same table.) The uniform representation of design options as features accommodates a greater variety of design options than previous approaches; adding a new design option (e.g. a new index type) merely entails characterizing it as a feature with appropriate parameters. We propose an approximation algorithm, based on this framework, that finds low-cost physical designs. In an initial phase, the algorithm examines the logical schema, data statistics, and queries, and generates "useful features"-features that might reduce query costs. In a subsequent phase, the algorithm uses the DBMS's cost estimates to find "best features"-features that belong to the lowest-cost designs for each individual query. Finally, the algorithm searches among conflict-free subsets of the best features of all the queries to find organizations with low global cost estimates. We have implemented a prototype physical design assistant for the INGRES relational DBMS, and we evaluate its designs for several benchmarks, including ASSSAP.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1993
Accession Number
ADA598292

Entities

People

  • Steve Rozen

Organizations

  • New York University

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Computer Programming
  • Computer Science
  • Computers
  • Cost Estimates
  • Database Management Systems
  • Databases
  • Domain Specific Programming Languages
  • Information Systems
  • Language
  • Lisp Programming Language
  • Programming Languages
  • Relational Database Management Systems
  • Relational Databases
  • Software Development
  • Statistics

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Life Cycle Cost Analysis
  • Theoretical Analysis.