BBN PLUM: MUC-4 Test Results and Analysis

Abstract

Our mid-term to long-term goals in data extraction from text for the next one to three years are to achieve much greater portability to new languages and new domains, greater robustness, and greater scalability. The novel aspect to our approach is the use of learning algorithms and probabilistic models to learn the domain-specific and language. specific knowledge necessary for a new domain and new language. Learning algorithms should contribute to scalability by making it feasible to deal with domains where it would be infeasible to invest sufficient human effort to bring a system up. Probabilistic models can contribute to robustness by allowing for words, constructions, and forms not anticipated ahead of time and by looking for the most likely interpretation in context. We began this research agenda approximately two years ago. During the last twelve months, we have focused much of our effort on porting our data extraction system (PLUM) to a new language (Japanese) and to two new domains. During the next twelve months, we anticipate porting PLUM to two or three additional domains. For any group to participate in MUC is a significant investment. To be consistent with our mid-term and long- term goals, we imposed the following constraints on ourselves in participating in MUC-4: * We would focus our effort on semi-automatically acquired knowledge. * We would minimize effort on handcrafted knowledge, and most generally. * We would minimize MUC-specific effort. Though the three self-imposed constraints meant our overall scores on the objective evaluation were not as high as if we had focused on handtuning and handcrafting the knowledge bases, MUC-4 became a vehicle for evaluating our progress on the long-term goals.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA460688

Entities

People

  • Damaris Ayuso
  • Heidi Fox
  • Herbert Gish
  • Ralph Weischedel
  • Robert Ingria
  • Sean Boisen

Organizations

  • BBN Technologies

Tags

DTIC Thesaurus Topics

  • Acquisition
  • Algorithms
  • Classification
  • Debugging
  • Filtration
  • Language
  • Machine Learning
  • Models
  • Natural Languages
  • Precision
  • Probabilistic Models
  • Statistical Algorithms
  • Template Patterns
  • Test Sets
  • Training
  • United States
  • United States Government

Fields of Study

  • Computer science

Readers

  • Clinical Trial Research.
  • Computational Linguistics
  • Strategic Security Studies