University of Sheffield: Description of the LaSIE System as Used for MUC-6

Abstract

The LaSIE (Large Scale Information Extraction) system has been developed at the University of Sheffield as part of an ongoing research effort into information extraction and, more generally, natural language engineering. LaSIE is a single, integrated system that builds up a unified model of a text which is then used to produce outputs for all four of the MUC-6 tasks. Of course this model may also be used for other purposes aside from MUC-6 results generation, for example we currently generate natural language summaries of the MUC-6 scenario results. Put most broadly, and superficially, our approach involves compositionally constructing semantic representations of individual sentences in a text according to semantic rules attached to phrase structure constituents which have been obtained by syntactic parsing using a corpus-derived context-free grammar. The semantic representations of successive sentences are then integrated into a 'discourse model' which, once the entire text has been processed, may be viewed as a specialisation of a general world model with which the system sets out to process each text. LaSIE has a historical connection with the University of Sussex MUC-5 system [GCE93] from which it derives its approach to world modelling and co-reference resolution and its approach to recombining fragmented semantic representations which result from partial grammatical coverage. However, the parser and grammar differ significantly from those used in the Sussex system. In its approach to named entity identification LaSIE borrows to some extent from the approach adopted in the MUC-5 Diderot system [CGJ+93]. Virtually all of the code in LaSIE is new and has been developed since January 1995 with about 20 person-months of effort.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 1995
Accession Number
ADA636158

Entities

People

  • H. Cunningham
  • K. Humphreys
  • R. Gaizauskas
  • T. Wakao
  • Y. Wilks

Organizations

  • University of Sheffield

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Classification
  • Computational Linguistics
  • Computer Science
  • Context Free Grammars
  • Errors
  • Grammars
  • Hierarchies
  • Integrated Systems
  • Language
  • Linguistics
  • Models
  • Natural Languages
  • Ontologies
  • Precision
  • Probabilistic Models
  • Recognition
  • Universities

Fields of Study

  • Computer science

Readers

  • Academic Conference Management
  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation