Using Clustering Strategies for Creating Authority Files

Abstract

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographic entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2000
Accession Number
ADA453900

Entities

People

  • Allison L. Powell
  • Eric Schulman
  • James C. French

Organizations

  • University of Virginia

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Astronomy
  • California
  • Clustering
  • Computer Science
  • Computers
  • Construction
  • Data Analysis
  • Databases
  • Information Retrieval
  • Information Science
  • Knowledge Management
  • Observatories
  • Productivity
  • Quality Control
  • Standards
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Library and Information Science
  • Neural Network Machine Learning.

Technology Areas

  • Space