Rough Set Based Splitting Criterion for Binary Decision Tree Classifiers
Abstract
Pattern recognition applications often use inductive reasoning to find hidden relationships and concepts within a data set. Many computational models and heuristics exist to assist in induction [12, 14, 18, 22, 26, 37]. Even so, researchers continue to actively develop new inductive methods [8, 10, 45, 46]. The research in this thesis advances induction for pattern classification by presenting the derivation and application of a new measure of inlon%ation based on rough set theory - the rough product. The rough product helps us to understand the manner in which an attribute value partition affects the upper approximation for each decision class. The thesis also presents an application of the rough product in a splitting criterion for binary decision tree classifiers. Using a MATLAB(trade name) software tool that we developed for this research, we compare the performance of the Gini Index, Twoing Rule, Maximum Deviance Reduction, and Rough Product splitting criteria on various data sets using k-folds cross validation. We determine performance by measuring the following metrics: accuracy, error rate, precision, recall, F-measure, node count, depth count, and complexity. Our results suggest that, in the presence of noisy data, the Rough Product splitting criterion could construct binary decision trees that are simpler and shorter than those produced by the Gini Index, Twoing Rule, or Maximum Deviance Reduction splitting criteria.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 26, 2006
- Accession Number
- ADA479106
Entities
People
- Dariusz Mikulski
Organizations
- United States Army Tank Automotive Research, Development and Engineering Center