SRI International Fastus System MUC-4 Test Results and Analysis

Abstract

The system that SRI used for the MUC-4 evaluation represents a significant departure from system architectures that have been employed in the past. In MUC-2 and MUC-3, SRI used the TACITUS text processing system [1], which was based on the DIALOGIC parser and grammar, and an abudctive reasoner for horn-clause logic. In MUC-4, SRI designed a new system called FASTUS (a permutation of the initial letters in Finite State Automata-based Text Understanding System) which we feel represents a significant advance in the state of the art of text processing. The system shares certain modules with the earlier TACITUS system, namely modules for text preprocessing and standardization, spelling correction, Hispanic name recognition, and the core lexicon. However, the DIALOGIC system and abductive reasoner, which were the heart and soul of the previous system, were replaced by a system whose architecture is based on cascaded finite-state automata. Using this system we were capable of achieving a significant level of performance on the MUC-4 task with less than one month devoted to domain-specific development. In addition, the system is extremely fast, and is capable of processing texts at the rate of approximately 3,200 words per minute, measured in CPU time on a Sun SPARC-2 processor. (Measured according to elapsed real time, the system about 50% slower, but the observed time depends on the particular hardware configuration involved.)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA460973

Entities

People

  • David Israel
  • Douglas E. Appelt
  • Jerry R. Hobbs
  • John Bear
  • Mabry Tyson

Organizations

  • SRI International

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Abstracts
  • Automata
  • Contracts
  • Filters
  • Filtration
  • Information Operations
  • Machines
  • Permutations
  • Precision
  • Preprocessing
  • Recognition
  • Sequences
  • Standardization
  • Template Patterns
  • Test And Evaluation
  • Text Processing

Readers

  • Computational Linguistics
  • Parallel and Distributed Computing.