Absolute Bounds on Set Intersection and Union Sizes from Distribution Information,

Abstract

Estimation of set intersection and union sizes is important for access method selection for a database. Absolute bounds on sizes are often much easier to compute than size estimates, requiring no distributional or independence assumptions, and can answer many of the same needs. We present a large compendium of quick closed-form bounds on set intersection and union sizes, each applying to a different situation; they can be expressed as rules, and managed by rule-based or 'knowledge-base' architecture. These methods use general-purpose statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the data items onto distributions that can be more easily analyzed. Our methods can be used anytime, but tend to work best when there are strong or complex correlations in the data. This circumstance is poorly addressed by the standard methods of independence-assumption and distributional-assumption estimates, and hence our methods fill a need. Keywords: Databases, Query processing, Statistical computing, Statistical inequalities, Sets, Boolean algebra, Estimation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1985
Accession Number
ADA164025

Entities

People

  • Neil C. Rowe

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Algorithms
  • Boolean Algebra
  • Computations
  • Computer Science
  • Computers
  • Data Analysis
  • Data Science
  • Databases
  • Inequalities
  • Information Processing
  • Information Science
  • Mathematics
  • Military Research
  • Probability
  • Standards
  • Statistical Analysis
  • Statistics

Readers

  • Artificial Intelligence
  • Mathematical Modeling and Probability Theory.
  • Regression Analysis.