UMass/Hughes: Description of the Circus System Used for MUC-5

Abstract

The primary goal of our effort is the development of robust and portable language processing capabilities for information extraction applications. The system under evaluation here is based on language processing components that have demonstrated strong performance capabilities in previous evaluations [Lehnert et al. 1992a]. Having demonstrated the general viability of these techniques, we are now concentrating on the practicality of our technology by creating trainable system components to replace hand-coded data and manually-engineered software. Our general strategy is to automate the construction of domain-specific dictionaries and other language- related resources so that information extraction can be customized for specific applications with a minimal amount of human assistance. We employ a hybrid system architecture that combines selective concept extraction [Lehnert 1991] technologies developed at UMass with trainable classifier technologies developed at Hughes [Dolan et al. 1991]. Our MUC-5 system incorporates seven trainable language components to handle (1) lexical recognition and part-of-speech tagging, (2) knowledge of semantic/syntactic interactions, (3) semantic feature Lagging, (4) noun phrase analysis, (5) limited conference resolution, (6) domain object recognition, and (7) relational link recognition. Our trainable components have been developed so domain experts who have no background in natural language or machine learning can train individual system components in the space of a few hours.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1993
Accession Number
ADA458576

Entities

People

  • C. Cardie
  • C. Dolan
  • E. Riloff
  • Fan Feng
  • J. Mccarthy
  • John C. Peterson
  • S. Goldman
  • S. Soderland
  • W. Lehnert

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Acquisition
  • Algorithms
  • Computational Linguistics
  • Computer Programming
  • Computer Science
  • Dictionaries
  • Errors
  • Governments
  • Language
  • Linguistics
  • Lisp Programming Language
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Recognition
  • Text Processing
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • Space