Improving Information Extraction and Translation Using Component Interactions
Abstract
The traditional natural language processing (NLP) pipeline incorporates multiple stages of linguistic analysis. Although errors are typically compounded through the pipeline, it is possible to reduce the errors in one stage by harnessing the results of the other stages. This thesis presents a new framework based on component interactions to approach this goal. The new framework applies all stages in a suitable order, with each stage generating multiple hypotheses and propagating them through the whole pipeline. The feedback from subsequent stages is then used to enhance the target stage by re-ranking these hypotheses and producing the best analysis. The effectiveness of this framework has been demonstrated by substantially improving the performance of Chinese and English entity extraction and Chinese-to-English entity translation. The inference knowledge includes monolingual interactions among information extraction stages such as name tagging, coreference resolution, relation extraction, and event extraction, as well as cross-lingual interaction between information extraction and machine translation. Such symbiosis of analysis components allows the author to incorporate information from a much wider context, spanning the entire document and even going across documents, and to utilize deeper semantic analysis. It will therefore be essential for the creation of a high-performance NLP pipeline.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2008
- Accession Number
- ADA477326
Entities
People
- Heng Ji
Organizations
- New York University