Five Performance Enhancements for Hybrid Hash Join

Abstract

In this paper, we focus on set matching algorithms such as intersection, difference, union, and relational join, using join as a representative for all these matching problems. We discuss five performance enhancements for hybrid hash join algorithms, namely data compression, large cluster sizes and multi-level recursion, role reversal of build and probe inputs, histogram methods to exploit non-uniform data and hash value distributions (skew), and join algorithms for multiple inputs. While each of the enhancements is fairly simple, the most surprising result is that hash value skew can be exploited and improve performance rather than being a danger to hybrid hash join performance as conventionally thought. Our design for hash-based N-way matching algorithms is a dual to pipelining data without intermediate sorting between multiple merge-joins on the same attribute (interesting orderings), and exceeds its performance advantages. Each of the performance enhancements can be used by itself or they can be combined with each other as well as with parallel query execution techniques. The cumulative effect of the optimizations is that hybrid hash join will almost always be the set matching algorithm of choice, even in situations for which earlier research had recommended sorting and merge-join. DATABASE QUERY PROCESSING, SET MATCHING, HYBRID HASH JOIN, TUNING, DATA COMPRESSION, I/O SPEED, FAN-OUT, RECURSION DEPTH, ROLE REVERSAL, NON-UNIFORMITY, HISTOGRAMS, INTERESTING ORDERINGS, N-WAY PARTITIONING,

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1992
Accession Number
ADA461521

Entities

People

  • Goetz Graefe

Organizations

  • University of Colorado Boulder

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Availability
  • Classification
  • Colorado
  • Compression
  • Computers
  • Contracts
  • Data Compression
  • Databases
  • Heuristic Methods
  • Histograms
  • Information Operations
  • Instructions
  • Mathematics
  • Monitoring
  • Optimization

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.
  • Statistical inference.