Domain-Specific Term-List Expansion Using Existing Linguistic Resources

Abstract

This report describes a series of experiments involving expansion of a domain-specific human-generated "seed list" using available linguistic resources. The resources used for the expansion are intended to be general purpose: two large-scale Chinese-English dictionaries and a Chinese lexical knowledge base (HowNet). The methodology involves three steps: (1) hand extraction of head words from each entry in the human-generated seed list; (2) automatic comparison of these head words against entries in the linguistic resources-where an entry matches if the head word matches the entry exactly or is included in its the semantic definition; and (3) collection of any resulting matching entries into a larger term list. The terms extracted by this process were verified manually to confirm whether they were relevant to the topic of a specific domain. An important contribution of this work is the finding that the use of a bilingual term list for the expansion proces does not provide a significant improvement over the use of a simpler, more easily produced, monoligual term list.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2002
Accession Number
ADA525800

Entities

People

  • Bonnie J. Dorr
  • Tiejun Zhao

Organizations

  • University of Maryland

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Chemical Products
  • Clustering
  • Dictionaries
  • English Language
  • Extraction
  • Far East
  • Foreign Languages
  • Information Operations
  • Information Retrieval
  • Language
  • Linguistics
  • Noise
  • Noise Reduction
  • Personality
  • Precision
  • Translations

Readers

  • Computational Linguistics
  • Systems Analysis and Design