Corpora and Data Preparation for Information Extraction

Abstract

The data selection and data preparation efforts which led to the TIPSTER and Fifth Message Understanding Conference (MUC-5) corpora involved substantial effort, time and resources. The Government commitment to these selection and preparation efforts stems from four TIPSTER Program objectives: (1) to provide training data that would promote the development of information extraction technology, (2) to provide accurate test data to evaluate and baseline system performance in an objective manner, (3) to provide baseline data for human performance to understand and interpret machine performance, and (4) to support the larger Natural Language Processing community by making available a unique set of texts and templates in multiple domains and languages under ARPA support. This commitment was demonstrated through the managerial, technical, and administrative support to these efforts from various Government agencies, as well as through the contractual efforts with the Institute for Defense Analyses for data preparation and New Mexico State University for software tool development.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1993
Accession Number
ADA630828

Entities

People

  • Boyan Onyshkevych
  • Lynn Carlson
  • Mary E. Okurowski

Organizations

  • United States Department of Defense

Tags

DTIC Thesaurus Topics

  • Commerce
  • Computer Programming
  • Department Of Defense
  • English Language
  • Extraction
  • Governments
  • Hard Copy
  • Japanese Language
  • Language
  • Materials
  • Motor Skills
  • Natural Language Processing
  • Natural Languages
  • New Mexico
  • Standards
  • Template Patterns
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Military and Counterinsurgency Studies.
  • Research Science/Academic Research
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation