BBN: Description of the PLUM System as Used for MUC-5

Abstract

Traditional approaches to the problem of extracting data from texts have emphasized hand-crafted linguistic knowledge. In contrast, BBN's PLUM system (Probabilistic Language Understanding Model) was developed as part of an ARPA-funded research effort on integrating probabilistic language models with more traditional linguistic techniques. Our research and development goals are: * more rapid development of new applications, * the ability to train (and re-train) systems based on user markings of correct and incorrect output, * more accurate selection among interpretations when more than one is found, and * more robust partial interpretation when no complete interpretation can be found. We began this research agenda approximately three years ago. During the past two years, we have evaluated much of our effort in porting our data extraction system (PLUM) to a new language (Japanese) and to two new domains. Three key design features distinguish PLUM: statistical language modeling, learning algorithms and partial understanding. The first key feature is the use of statistical modeling to guide processing. For the version of PLUM used in MUC-5, part of speech information was determined by using well-known Markov modeling techniques embodied in BBN's part-of-speech tagger POST [5]. We also used a correction model, AMED [3], for improving Japanese segmentation and part-of-speech tags assigned by JUMAN. For the microelectronics domain, we used a probabilistic model to help identify the role of a company in a capability (whether it is a developer, user, etc.). Statistical modeling in PLUM contributes to portability, robustness, and trainability. The second key feature is our use of learning algorithms both to obtain the knowledge bases used by PLUM's processing modules and to train the probabilistic algorithms. A third key feature is partial understanding. All components of PLUM are designed to operate on partially interpretable input.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1993
Accession Number
ADA460639

Entities

People

  • Constantine Papageorgiou
  • Damaris Ayuso
  • Dawn Maclaughlin
  • Heidi Fox
  • Hiroto Hosihi
  • June Abe
  • Masaichiro Kitagawa
  • Ralph Weischedel
  • Robert Ingria
  • Sean Boisen
  • Tomoyoshi Matsukawa
  • Tsutomu Saki
  • Yoichi Miyamoto

Organizations

  • BBN Technologies

Tags

Communities of Interest

  • Advanced Electronics
  • Cyber

DTIC Thesaurus Topics

  • Acquisition
  • Artificial Intelligence
  • Computational Linguistics
  • Computer Science
  • Computer Vision
  • Corporations
  • Databases
  • Language
  • Linguistics
  • Microelectronics
  • Models
  • Natural Language Processing
  • Precision
  • Probabilistic Models
  • Probability
  • Semantics
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation

Technology Areas

  • Microelectronics