Transductive Pattern Learning for Information Extraction

Abstract

The requirement for large labelled training corpora is widely recognized as a key bottleneck in the use of learning algorithms for information extraction. We present TPLEX, a semi-supervised learning algorithm for information extraction that can acquire extraction patterns from a small amount of labelled text in conjunction with a large amount of unlabelled text. Compared to previous work, TPLEX has two novel features. First, the algorithm does not require redundancy in the fragments to be extracted, but only redundancy of the extraction patterns themselves. Second, most bootstrapping methods identify the highest quality fragments in the unlabelled data and then assume that they are as reliable as manually labelled data in subsequent iterations. In contrast, TPLEX's scoring mechanism prevents errors from snowballing by recording the reliability of fragments extracted from unlabelled data. Our experiments with several benchmarks demonstrate that TPLEX is usually competitive with various fully-supervised algorithms when very little labelled training data is available.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2006
Accession Number
ADA456766

Entities

People

  • Brian Mclernon
  • Nicholas Kushmerick

Organizations

  • University College Dublin

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Science
  • Detectors
  • Extraction
  • Language
  • Learning
  • Machine Learning
  • Markov Models
  • Named Entity Recognition
  • Probabilistic Models
  • Probability
  • Semi-Supervised Learning
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks