Improving Information Extraction and Translation Using Component Interactions

Abstract

The traditional natural language processing (NLP) pipeline incorporates multiple stages of linguistic analysis. Although errors are typically compounded through the pipeline, it is possible to reduce the errors in one stage by harnessing the results of the other stages. This thesis presents a new framework based on component interactions to approach this goal. The new framework applies all stages in a suitable order, with each stage generating multiple hypotheses and propagating them through the whole pipeline. The feedback from subsequent stages is then used to enhance the target stage by re-ranking these hypotheses and producing the best analysis. The effectiveness of this framework has been demonstrated by substantially improving the performance of Chinese and English entity extraction and Chinese-to-English entity translation. The inference knowledge includes monolingual interactions among information extraction stages such as name tagging, coreference resolution, relation extraction, and event extraction, as well as cross-lingual interaction between information extraction and machine translation. Such symbiosis of analysis components allows the author to incorporate information from a much wider context, spanning the entire document and even going across documents, and to utilize deeper semantic analysis. It will therefore be essential for the creation of a high-performance NLP pipeline.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2008
Accession Number
ADA477326

Entities

People

  • Heng Ji

Organizations

  • New York University

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Data Mining
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Ontologies
  • Supervised Machine Learning

Readers

  • Computational Linguistics
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks