Creating Diverse Ensemble Classifiers to Reduce Supervision

Abstract

Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise. Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2005
Accession Number
ADA586892

Entities

People

  • Prem N. Melville

Organizations

  • University of Texas at Austin

Tags

Communities of Interest

  • Autonomy
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Science
  • Computer Languages
  • Computer Science
  • Data Mining
  • Data Science
  • Databases
  • Information Processing
  • Information Science
  • Machine Learning
  • Named Entity Recognition
  • Network Science
  • Neural Networks
  • Predictive Modeling
  • Probabilistic Models
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks