Five Performance Enhancements for Hybrid Hash Join
Abstract
In this paper, we focus on set matching algorithms such as intersection, difference, union, and relational join, using join as a representative for all these matching problems. We discuss five performance enhancements for hybrid hash join algorithms, namely data compression, large cluster sizes and multi-level recursion, role reversal of build and probe inputs, histogram methods to exploit non-uniform data and hash value distributions (skew), and join algorithms for multiple inputs. While each of the enhancements is fairly simple, the most surprising result is that hash value skew can be exploited and improve performance rather than being a danger to hybrid hash join performance as conventionally thought. Our design for hash-based N-way matching algorithms is a dual to pipelining data without intermediate sorting between multiple merge-joins on the same attribute (interesting orderings), and exceeds its performance advantages. Each of the performance enhancements can be used by itself or they can be combined with each other as well as with parallel query execution techniques. The cumulative effect of the optimizations is that hybrid hash join will almost always be the set matching algorithm of choice, even in situations for which earlier research had recommended sorting and merge-join. DATABASE QUERY PROCESSING, SET MATCHING, HYBRID HASH JOIN, TUNING, DATA COMPRESSION, I/O SPEED, FAN-OUT, RECURSION DEPTH, ROLE REVERSAL, NON-UNIFORMITY, HISTOGRAMS, INTERESTING ORDERINGS, N-WAY PARTITIONING,
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 1992
- Accession Number
- ADA461521
Entities
People
- Goetz Graefe
Organizations
- University of Colorado Boulder