MUC-4 Test Results and Analysis

Abstract

LSI's overall natural language processing (NLP) objective is the development of a broad coverage, reusable system which is readily transportable to additional domains, applications, and sublanguages in English, as well as providing a foundation for our multilingual work . Our system, called DBG, for Data Base Generator, is comprised of a set of NLP components which have been developed, extended, and rebuilt over a period of some years. The core of the system is an innovative Principle-based parser, using ideas from [1], which we began developing in the course of MUC-3 to replace our previous chart parser. Our approach thus relies on the concept of powerful, robust parsing as the most crucial component in an NLP system . In applying our NLP system to text extraction, our ultimate objective is to develop a high quality text extraction system, where "high quality " is defined as scoring above 80% -- a number well beyond any current MUC scores. In line with these NLP objectives, our major focus for MUC-4 was a follow-up to our main "lesson learned" in MUC-3, which was to acquire a machine-readable dictionary (MRD) and integrate its content into the DBG system. When attempts to acquire the computer-friendly Longmans or one of the Oxford Dictionaries were unsuccessful, we turned to ACL's CD-ROM containing the Collins English Dictionary . The most correct version of the CED on the ACL CD-ROM was apparently developed directly from a medium prepared for the typographer , and unfortunately lacks any documentation of features, fonts, language, etc . The effort of acquiring an d integrating the CED was clearly a worthwhile endeavor, since we were able to increase the number of entries i n our lexicon three-fold in a relatively short time (see Table 1) . The increase in lexicon size will benefit all the applications LSI is currently working on.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA458883

Entities

People

  • Alfredo Arnaiz
  • Bonnie G. Stalls
  • Christine A . Montgomery
  • Naicong Li
  • Robert E. Stumberger
  • Robert S. Belvin
  • Susan B. Hirsh

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Contracts
  • Databases
  • Dictionaries
  • Extraction
  • Forests
  • Formal Languages
  • Information Operations
  • Intelligent Systems
  • Language
  • Natural Language Processing
  • Natural Languages
  • Precision
  • Standards
  • Template Patterns

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Software Engineering.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation