Semi-Automated Methods for Refining a Domain-Specific Terminology Base
Abstract
A domain-specific term base may be useful not only as a resource for written and oral translation, but also for Natural Language Processing (NLP) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this investigation was to explore the use of alternative terminology extraction methods to refine and validate an existing military-specific bilingual dictionary. A series of semi-automatic methods was implemented to distill the existing term list by removing redundancies, resolving spelling variations, and separating individual expressions. Once the internal clean-up was completed, we compared two methods drawn from the terminology extraction literature in order to validate terms as military-specific and to propose a candidate list of non-specific terms for exclusion--term frequency calculations and terminology extraction lists. In this investigation, we wanted to find the best procedure to extract domain-specific terms for a low-resource domain; to demonstrate that terminology extraction methods can be used to validate and refine a domain-specific dictionary; and to provide the final, refined dictionary as a term base to support customization of machine translation systems for the military domain.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 2011
- Accession Number
- ADA538966
Entities
People
- Gabriella Rose
- Melissa Holland
- Robert Winkler
- Steve Larocca
Organizations
- United States Army Research Laboratory