Wrappers for Performance Enhancement and Oblivious Decision Graphs

Abstract

In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs). For accuracy estimation, we investigate cross-validation and the .632 bootstrap. We show examples where they fail and conduct a large scale study comparing them. We conclude that repeated runs of five-fold cross-validation give a good tradeo between bias and variance for the problem of model selection used in later chapters. We define the wrapper approach and use it for feature subset selection and parameter tuning. We relate definitions of feature relevancy to the set of optimal features, which is defined with respect to both a concept and an induction algorithm. The wrapper approach requires a search space, operators, a search engine, and an evaluation function. We investigate all of them in detail and introduce compound operators for feature subset selection. Finally, we abstract the search problem into search with probabilistic estimates. We introduce decision tables with a default majority rule (DTMs) to test the conjecture that feature subset selection is a very powerful bias. The accuracy of induced DTMs is surprisingly powerful, and we concluded that this bias is extremely important for many real-world datasets. We show that the resulting decision tables are very small and can be succinctly displayed. We study properties of oblivious read-once decision graphs (OODGs) and show that they do not suffer from some inherent limitations of decision trees.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1995
Accession Number
ADA601756

Entities

People

  • Ron Kohavi

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Bayesian Networks
  • Computational Science
  • Computer Programming
  • Computer Science
  • Computers
  • Data Mining
  • Data Science
  • Databases
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Medical Personnel
  • Network Science
  • Surveys

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Statistical inference.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Space