Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees

Abstract

We discuss the problem of selectivity estimation for range queries in real metric datasets, which include spatial, or dimensional, datasets as a special case. The main contribution of this paper is that, surprisingly, many diverse datasets follow a "power law". This is the first analysis of distance distributions for metric datasets. We named the exponent of our power law as the "Distance Exponent". We show that it plays an important role for the analysis of real, metric datasets. Specifically, we show: (a) how to use it to derive formulas for selectivity estimation of range queries, and (b) how to measure it quickly from a metric index tree (like an M-tree). We do experiments on many real datasets (road intersections of U.S. counties, vectors characteristics extracted from face matching systems, distance matrixes) and synthetic datasets (Sierpinsky triangle and 2-dimensional line). The experiments show that our selectivity estimation formulas are accurate, always being within one standard deviation from the measurements. Moreover, that our algorithm to estimate the "distance exponent" gives less than 20% error, while it saves orders of magnitude in computation time.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1999
Accession Number
ADA363780

Entities

People

  • Agma J. Traina
  • Caetano Traina Jr.
  • Christos Faloutsos

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Biomedical
  • Space

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Computations
  • Computer Science
  • Cost Models
  • Data Mining
  • Data Sets
  • Databases
  • Distribution Functions
  • English Language
  • Geographic Information Systems
  • Homosexuality
  • Knowledge Management
  • Language
  • Probability
  • Two Dimensional
  • United States Government

Fields of Study

  • Computer science

Readers

  • Atmospheric Science / Meteorology, specifically Wind Wave Turbulence.
  • Computer Vision.