Understanding cardinality estimation using entropy maximization

Abstract

Cardinality estimation is the problem of estimating the number of tuples returned by a query; it is a fundamentally important task in data management, used in query optimization, progress estimation, and resource provisioning. We study cardinality estimation in a principled framework: given a set of statistical assertions about the number of tuples returned by a fixed set of queries, predict the number of tuples returned by a new query. We model this problem using the probability space, over possible worlds, that satisfies all provided statistical assertions and maximizes entropy. We call this the Entropy Maximization model for statistics (MaxEnt). In this article we develop the mathematical techniques needed to use the MaxEnt model for predicting the cardinality of conjunctive queries.

Document Details

Document Type
Pub Defense Publication
Publication Date
Feb 01, 2012
Source ID
10.1145/2109196.2109202

Entities

People

  • Christopher RĂ©
  • D. Suciu

Organizations

  • Air Force Research Laboratory
  • Division of Information and Intelligent Systems
  • Office of Naval Research
  • University of Washington
  • University of Wisconsin–Madison

Tags

Fields of Study

  • Computer science
  • Mathematics

Readers

  • Computational Linguistics
  • Operations Research
  • Statistical inference.

Technology Areas

  • Space