Tasks, Domains, and Languages for Information Extraction
Abstract
The information extraction tasks for the ARPA TIPSTER program center on automatically filling object-oriented data structures, called templates, with information extracted from free text in news stories (for discussion of templates and objects, see "Template Design for Information Extraction" in this volume). With text as input, the TIPSTER systems first detect whether the text contains relevant information. If so, the systems extract specific instances of generic types of information that correspond to each slot in the template and output that information by filling the template slots in an appropriate data representation. These slots are then scored by using an automatic scoring program with templates produced by human analysts that serve as answer keys. Human analysts also prepared development set templates for each domain, which served as training models for system developers (for discussion of the data preparation effort, see "Corpora and Data Preparation for Information Extraction" in this volume).
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 1993
- Accession Number
- ADA630822
Entities
People
- Boyan Onyshkevych
- Lynn Carlson
- Mary E. Okurowski
Organizations
- United States Department of Defense