MUC-4 Test Results and Analysis
Abstract
LSI's overall natural language processing (NLP) objective is the development of a broad coverage, reusable system which is readily transportable to additional domains, applications, and sublanguages in English, as well as providing a foundation for our multilingual work . Our system, called DBG, for Data Base Generator, is comprised of a set of NLP components which have been developed, extended, and rebuilt over a period of some years. The core of the system is an innovative Principle-based parser, using ideas from [1], which we began developing in the course of MUC-3 to replace our previous chart parser. Our approach thus relies on the concept of powerful, robust parsing as the most crucial component in an NLP system . In applying our NLP system to text extraction, our ultimate objective is to develop a high quality text extraction system, where "high quality " is defined as scoring above 80% -- a number well beyond any current MUC scores. In line with these NLP objectives, our major focus for MUC-4 was a follow-up to our main "lesson learned" in MUC-3, which was to acquire a machine-readable dictionary (MRD) and integrate its content into the DBG system. When attempts to acquire the computer-friendly Longmans or one of the Oxford Dictionaries were unsuccessful, we turned to ACL's CD-ROM containing the Collins English Dictionary . The most correct version of the CED on the ACL CD-ROM was apparently developed directly from a medium prepared for the typographer , and unfortunately lacks any documentation of features, fonts, language, etc . The effort of acquiring an d integrating the CED was clearly a worthwhile endeavor, since we were able to increase the number of entries i n our lexicon three-fold in a relatively short time (see Table 1) . The increase in lexicon size will benefit all the applications LSI is currently working on.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1992
- Accession Number
- ADA458883
Entities
People
- Alfredo Arnaiz
- Bonnie G. Stalls
- Christine A . Montgomery
- Naicong Li
- Robert E. Stumberger
- Robert S. Belvin
- Susan B. Hirsh