Rough Set Based Splitting Criterion for Binary Decision Tree Classifiers

Abstract

Pattern recognition applications often use inductive reasoning to find hidden relationships and concepts within a data set. Many computational models and heuristics exist to assist in induction [12, 14, 18, 22, 26, 37]. Even so, researchers continue to actively develop new inductive methods [8, 10, 45, 46]. The research in this thesis advances induction for pattern classification by presenting the derivation and application of a new measure of inlon%ation based on rough set theory - the rough product. The rough product helps us to understand the manner in which an attribute value partition affects the upper approximation for each decision class. The thesis also presents an application of the rough product in a splitting criterion for binary decision tree classifiers. Using a MATLAB(trade name) software tool that we developed for this research, we compare the performance of the Gini Index, Twoing Rule, Maximum Deviance Reduction, and Rough Product splitting criteria on various data sets using k-folds cross validation. We determine performance by measuring the following metrics: accuracy, error rate, precision, recall, F-measure, node count, depth count, and complexity. Our results suggest that, in the presence of noisy data, the Rough Product splitting criterion could construct binary decision trees that are simpler and shorter than those produced by the Gini Index, Twoing Rule, or Maximum Deviance Reduction splitting criteria.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 26, 2006
Accession Number
ADA479106

Entities

People

  • Dariusz Mikulski

Organizations

  • United States Army Tank Automotive Research, Development and Engineering Center

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Breast Cancer
  • Cancer
  • Computer Programs
  • Computer Science
  • Computer Vision
  • Computers
  • Data Mining
  • Feature Selection
  • Graphical User Interface
  • Information Science
  • Machine Learning
  • Pattern Recognition
  • Reasoning
  • Three Dimensional
  • Two Dimensional
  • United States

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Graph Algorithms and Convex Optimization.
  • Mathematical Modeling and Probability Theory.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms