On Minimizing Training Corpus for Parser Acquisition

Abstract

Many corpus-based natural language processing systems rely on using large quantities of annotated text as their training examples. Building this kind of resource is an expensive and labor-intensive project. To minimize effort spent on annotating examples that are not helpful the training process., recent research efforts have begun to apply active learning techniques to selectively choose data to be annotated. In this work, we consider selecting training examples with the it tree-entropy metric. Our goal is to assess how well this selection technique can be applied for training different types of parsers. We find that tree-entropy can significantly reduce the amount of training annotation for both a history-based parser and an EM-based parser. Moreover, the examples selected for the history-based parser are also good for training the EM-based parser, suggesting that the technique is parser independent.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2001
Accession Number
ADA458746

Entities

People

  • Rebecca Hwa

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Contracts
  • Information Operations
  • Language
  • Natural Language Processing
  • Natural Languages
  • Test Sets
  • Three Dimensional
  • Training
  • Universities

Readers

  • Computational Linguistics
  • Instructional Design and Training Evaluation.
  • Life Cycle Cost Analysis

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks