Unsupervised Discovery of Extraction Patterns for Information Extraction
Abstract
The task of Information Extraction (IE) is to find specific types of information in natural language text. In particular, event extraction identifies instances of a particular type of event or fact (a particular "scenario"), including the entities involved, and fills a database which has been pre-defined for the scenario, As the number of documents available on-line has multiplied, entity extraction has grown in importance for various applications, including tracking terrorist activities from newswire sources and building a database of job postings from the Web, to name a few. Linguistic contexts, such as predicate-argument relationships, have been widely used as extraction patterns to identify the items to be extracted from the text. The cost of creating extraction patterns for each scenario has been a bottleneck limiting the portability of information extraction systems to different scenarios, although there has been some research on semi-supervised pattern discovery procedures to reduce this cost. The challenge is to develop a fully automatic method for identifying extraction patterns for a scenario specified by the user. This dissertation presents a novel approach for the unsupervised discovery of extraction patterns for event extraction from raw text. First, we present a framework that allows the user to have a self-customizing information extraction system for his/her query: the Query-Driven Information Extraction (QDIE) framework. The input to the QDIE framework is the user's query: either a set of keywords or a narrative description of the event extraction task. Second, we assess the improvement in extraction pattern models. By considering the shortcomings of the prior work based on predicate-argument models and their extensions, we propose a novel extraction pattern model that is based on arbitrary subtrees of dependency trees. Third, we address the issue of portability across languages.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2004
- Accession Number
- ADA598383
Entities
People
- Kiyoshi Sudo
Organizations
- New York University