Five Performance Enhancements for Hybrid Hash Join

Abstract

In this paper, we focus on set matching algorithms such as intersection, difference, union, and relational join, using join as a representative for all these matching problems. We discuss five performance enhancements for hybrid hash join algorithms, namely data compression, large cluster sizes and multi-level recursion, role reversal of build and probe inputs, histogram methods to exploit non-uniform data and hash value distributions (skew), and join algorithms for multiple inputs. While each of the enhancements is fairly simple, the most surprising result is that hash value skew can be exploited and improve performance rather than being a danger to hybrid hash join performance as conventionally thought. Our design for hash-based N-way matching algorithms is a dual to pipelining data without intermediate sorting between multiple merge-joins on the same attribute (interesting orderings), and exceeds its performance advantages. Each of the performance enhancements can be used by itself or they can be combined with each other as well as with parallel query execution techniques. The cumulative effect of the optimizations is that hybrid hash join will almost always be the set matching algorithm of choice, even in situations for which earlier research had recommended sorting and merge-join. DATABASE QUERY PROCESSING, SET MATCHING, HYBRID HASH JOIN, TUNING, DATA COMPRESSION, I/O SPEED, FAN-OUT, RECURSION DEPTH, ROLE REVERSAL, NON-UNIFORMITY, HISTOGRAMS, INTERESTING ORDERINGS, N-WAY PARTITIONING,

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 1992
Accession Number: ADA461521

Entities

People

Goetz Graefe

Organizations

University of Colorado Boulder

Five Performance Enhancements for Hybrid Hash Join

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers