Semi-Automated Methods for Refining a Domain-Specific Terminology Base

Abstract

A domain-specific term base may be useful not only as a resource for written and oral translation, but also for Natural Language Processing (NLP) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this investigation was to explore the use of alternative terminology extraction methods to refine and validate an existing military-specific bilingual dictionary. A series of semi-automatic methods was implemented to distill the existing term list by removing redundancies, resolving spelling variations, and separating individual expressions. Once the internal clean-up was completed, we compared two methods drawn from the terminology extraction literature in order to validate terms as military-specific and to propose a candidate list of non-specific terms for exclusion--term frequency calculations and terminology extraction lists. In this investigation, we wanted to find the best procedure to extract domain-specific terms for a low-resource domain; to demonstrate that terminology extraction methods can be used to validate and refine a domain-specific dictionary; and to provide the final, refined dictionary as a term base to support customization of machine translation systems for the military domain.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2011
Accession Number
ADA538966

Entities

People

  • Gabriella Rose
  • Melissa Holland
  • Robert Winkler
  • Steve Larocca

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Weapons Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Army
  • Command And Control
  • Computers
  • Dictionaries
  • Extraction
  • Frequency
  • Indirect Fire
  • Information Operations
  • Information Science
  • Law
  • Military Research
  • Refining
  • Spreadsheet Software
  • Training
  • Warfare
  • Word Processors

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation