Tasks, Domains, and Languages for Information Extraction

Abstract

The information extraction tasks for the ARPA TIPSTER program center on automatically filling object-oriented data structures, called templates, with information extracted from free text in news stories (for discussion of templates and objects, see "Template Design for Information Extraction" in this volume). With text as input, the TIPSTER systems first detect whether the text contains relevant information. If so, the systems extract specific instances of generic types of information that correspond to each slot in the template and output that information by filling the template slots in an appropriate data representation. These slots are then scored by using an automatic scoring program with templates produced by human analysts that serve as answer keys. Human analysts also prepared development set templates for each domain, which served as training models for system developers (for discussion of the data preparation effort, see "Corpora and Data Preparation for Information Extraction" in this volume).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1993
Accession Number
ADA630822

Entities

People

  • Boyan Onyshkevych
  • Lynn Carlson
  • Mary E. Okurowski

Organizations

  • United States Department of Defense

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Agreements
  • Commerce
  • Department Of Defense
  • Electrical Equipment
  • Extraction
  • Fabrication
  • Language
  • Light Sources
  • Lithography
  • Manufacturing
  • Microelectronics
  • Packaging
  • Production
  • Standards
  • Technology Transfer
  • Template Patterns

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval