Improving Information Extraction and Translation Using Component Interactions

Abstract

The traditional natural language processing (NLP) pipeline incorporates multiple stages of linguistic analysis. Although errors are typically compounded through the pipeline, it is possible to reduce the errors in one stage by harnessing the results of the other stages. This thesis presents a new framework based on component interactions to approach this goal. The new framework applies all stages in a suitable order, with each stage generating multiple hypotheses and propagating them through the whole pipeline. The feedback from subsequent stages is then used to enhance the target stage by re-ranking these hypotheses and producing the best analysis. The effectiveness of this framework has been demonstrated by substantially improving the performance of Chinese and English entity extraction and Chinese-to-English entity translation. The inference knowledge includes monolingual interactions among information extraction stages such as name tagging, coreference resolution, relation extraction, and event extraction, as well as cross-lingual interaction between information extraction and machine translation. Such symbiosis of analysis components allows the author to incorporate information from a much wider context, spanning the entire document and even going across documents, and to utilize deeper semantic analysis. It will therefore be essential for the creation of a high-performance NLP pipeline.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2008
Accession Number: ADA477326

Entities

People

Heng Ji

Organizations

New York University

Improving Information Extraction and Translation Using Component Interactions

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas