Information Extraction from Multiple Syntactic Sources

Abstract

Information Extraction is the automatic extraction of facts from text, which includes detection of named entities, entity relations and events. Conventional approaches to Information Extraction try to find syntactic patterns based on deep processing of text, such as partial or full parsing. The problem these solutions have to face is that as deeper analysis is used, the accuracy of the result decreases, and one cannot recover from the induced errors. On the other hand, lower level processing is more accurate and it can also provide useful information. However, within the framework of conventional approaches, this kind of information can not be efficiently incorporated. This thesis describes a novel supervised approach based on kernel methods to address these issues. In this approach customized kernels are used to match syntactic structures produced from different preprocessing phases. Using properties of a kernel, individual kernels are combined into a composite kernel to integrate and extend all the information. The composite kernels can be used with various classifiers, such as Nearest Neighbor or Support Vector Machines (SVM). The main classifier we propose to use is SVM due to its ability to generalize in large dimensional feature spaces. We will show that each level of syntactic information can contribute to IE tasks, and low level information can help to recover from errors in deep processing.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2004
Accession Number
ADA439982

Entities

People

  • Shubin Zhao

Organizations

  • New York University

Tags

Communities of Interest

  • Biomedical
  • C4I
  • Counter IED
  • Energy and Power Technologies
  • Weapons Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computer Languages
  • Hidden Markov Models
  • Kernel Functions
  • Language
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Ontologies
  • Recognition
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Graph Algorithms and Convex Optimization.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • Space