Reducing Annotation Effort Using Generalized Expectation Criteria

Abstract

Generalized expectation (GE) criteria [McCallum et al., 2007] are terms in objective functions that assign scores to values of model expectations. In this paper we introduce GE-FL, a method that uses GE to train a probabilistic model using associations between input features and classes rather than complete labeled instances. Specifically, here the expectations are model predicted class distributions on unlabeled instances that contain selected input features. The score function is the KL divergence from reference distributions estimated using feature-class associations. We show that a multinomial logistic regression model trained with GE-FL outperforms several baseline methods that use feature-class associations. Next, we compare with a method that incorporates feature-class associations into Boosting [Schapire et al., 2002] and find that it requires 400 labeled instances to attain the same accuracy as GE-FL, which uses no labeled instances. In human annotation experiments, we show that labeling features is on average 3.7 times faster than labeling documents, a result that supports similar findings in previous work [Raghavan et al., 2006]. Additionally, using GE-FL provides a 1.0% absolute improvement in final accuracy over semi-supervised training with labeled documents. The accuracy difference is often much more pronounced with only a few minutes of annotation, where we see absolute accuracy improvements as high as 40%.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 30, 2007
Accession Number: ADA493136

Entities

People

Andrew McCallum
Gideon Mann
Gregory Druck

Organizations

University of Massachusetts Amherst

Reducing Annotation Effort Using Generalized Expectation Criteria

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas