Duplicate Record Elimination in Large Data Files.

Abstract

This paper addresses the issue of duplicate elimination in large data files in which many occurrences of the same record may appear. A comprehensive cost analysis of the duplicate elimination operation is presented. This analysis is based on a combinatorial model developed for estimating the size of intermediate runs produced by a modified merge-sort procedure. The performance of this merge-sort procedure is demonstrated to be significantly superior to the standard duplicate elimination technique of sorting followed by a sequential pass to locate duplicate records. The results can also be used to provide critical input to a query optimizer in a relational database system. (Author)

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Aug 01, 1981
Accession Number: ADA110052

Entities

People

David J. Dewitt
Dina Friedland

Organizations

University of Wisconsin Madison Department of Computer Science

Duplicate Record Elimination in Large Data Files.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers