Active Learning with a Human in The Loop

Abstract

Text annotation is an expensive pre-requisite for applying data-driven natural language processing techniques to new datasets. Tools that can reliably reduce the time and money required to construct an annotated corpus would be of immediate value to MITRE's sponsors. To this end, we have explored the possibility of using active learning strategies to aid human annotators in performing a basic named entity annotation task. Our experiments consider example-based active learning algorithms that are widely believed to reduce the number of examples and therefore reduce cost but instead show that once the true costs of human annotation is taken into consideration the savings from using active learning vanishes. Our experiments with human annotators confirm that human annotation times vary greatly and are diffcult to predict, a fact that has received relatively little attention in the academic literature on active learning for natural language processing. While our study was far from exhaustive, we found that the literature supporting active learning typically focuses on reducing the number of examples to be annotated while ignoring the costs of manual annotation. To date there is no published work suggesting that active learning actually reduces annotation time or cost for the sequence labeling annotation task we consider. For these reasons combined with the non-trivial costs and constraints imposed by active learning, we have decided to exclude active learning support from our annotation tool suite, and we are unable to recommend active learning in the form we detail in this technical report to our sponsors as a strategy for reducing costs for natural language annotation tasks.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 01, 2012
Accession Number
ADA584489

Entities

People

  • Robyn Kozierok
  • Sam Bayer
  • Seamus Clancy

Organizations

  • MITRE Corporation

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Cost Reductions
  • Costs
  • Information Science
  • Language
  • Literature
  • Machine Learning
  • Named Entity Recognition
  • Natural Language Processing
  • Natural Languages
  • Probabilistic Models
  • Probability
  • Recognition
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Library and Information Science
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks