Information Extraction Overview

Abstract

The information explosion of the last decade has placed increasing demands on processing and analyzing large volumes of on-line data. In response, the Advanced Research Projects Agency (ARPA) has been supporting research to develop a new technology called information extraction. Information extraction is a type of document processing which captures and outputs factual information contained within a document. Similar to an information retrieval (IR) system, an information extraction system responds to a user's information need. Whereas an IR system identifies a subset of documents in a large text database or in a library scenario a subset of resources in a library, an information extraction system identifies a subset of information within a document This subset of information is not necessarily a summary or gist of the contents of the document. Rather it corresponds to predefied generic types of information of interest and represents specific instances found in the text For example, a user of a system may be interested in identifying and databasing information on all companies named within a set of documents, including companies not previously known to the user. An information extraction system can extract and output all of the occurrences of company names within a text with an accuracy of 75%. Moreover, it is possible to specify that the system only extract those companies of a certain type, such as Japanese companies or companies in the textile industry.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1993
Accession Number
ADA633427

Entities

People

  • Mary E. Okurowski

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Algorithms
  • Databases
  • Department Of Defense
  • Dictionaries
  • Extraction
  • Information Retrieval
  • Language
  • Machine Translation
  • Natural Languages
  • New Mexico
  • Template Patterns
  • Test And Evaluation
  • Text Processing
  • Textile Industry
  • Universities

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Database Systems and Applications
  • Library and Information Science

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval