Scenario Customization for Information Extraction

Abstract

Information Extraction (IE) is an emerging NLP technology, whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts, in the text, and to use these facts to fill a database. IE systems today are commonly based on pattern matching. The core IE engine uses a cascade of sets of patterns of increasing linguistic complexity. Each pattern consists of a regular expression and an associated mapping from syntactic to logical form. The pattern sets are customized for each new topic, as defined by the set of facts to be extracted. Construction of a pattern base for a new topic is recognized as a time-consuming and expensive process--a principal roadblock to wider use of IE technology in the large.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2001
Accession Number
ADA440512

Entities

People

  • Roman Yangarber

Organizations

  • Defense Advanced Research Projects Agency

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies
  • Weapons Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Automated Text Summarization
  • Computational Science
  • Computer Languages
  • Computer Science
  • Governments
  • Information Science
  • Language
  • Lisp Programming Language
  • Machine Learning
  • Markov Models
  • Natural Disasters
  • Natural Language Processing
  • Operating Systems
  • Recognition

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation