Scenario Customization for Information Extraction
Abstract
Information Extraction (IE) is an emerging NLP technology, whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts, in the text, and to use these facts to fill a database. IE systems today are commonly based on pattern matching. The core IE engine uses a cascade of sets of patterns of increasing linguistic complexity. Each pattern consists of a regular expression and an associated mapping from syntactic to logical form. The pattern sets are customized for each new topic, as defined by the set of facts to be extracted. Construction of a pattern base for a new topic is recognized as a time-consuming and expensive process--a principal roadblock to wider use of IE technology in the large.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2001
- Accession Number
- ADA440512
Entities
People
- Roman Yangarber
Organizations
- Defense Advanced Research Projects Agency