Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning

Abstract

Learning to rank is becoming an increasingly popular research area in machine learning. The ranking problem aims to induce an ordering or preference relations among a set of instances in the input space. However, collecting labeled data is growing into a burden in many rank applications since labeling requires eliciting the relative ordering over the set of alternatives. In this paper, we propose a novel active learning framework for SVM-based and boosting-based rank learning. Our approach suggests sampling based on maximizing the estimated loss differential over unlabeled data. Experimental results on two benchmark corpora show that the proposed model substantially reduces the labeling effort, and achieves superior performance rapidly with as much as 30% relative improvement over the margin-based sampling baseline.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2008
Accession Number
ADA531307

Entities

People

  • Jaime Carbonell
  • Pinar Donmez

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Computing
  • Artificial Intelligence Software
  • Classification
  • Data Mining
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Natural Language Processing
  • Precision
  • Sampling
  • Standards
  • Statistical Sampling
  • Supervised Machine Learning
  • Test Sets
  • Training

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Space