Generalized Robust Feature Selection
Abstract
Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided variables, square variables, and two-way interactions as an extended data set. The algorithm implements a forward selection approach, based on correlation with the response, while fitting deep neural networks to the selected variables. These deep neural networks maintain an adaptive architecture which mirrors a full factorial design. These networks assess numeric and categorical values for both features and responses. Implementing this approach in ensemble with Recursive Feature Elimination we establish a new Pareto Frontier, consisting solely of this technique, for the Wisconsin Breast Cancer problem instance. This Pareto Frontier highlights our ensemble approach as the best performing method in both feature reduction and predictive accuracy.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 24, 2022
- Accession Number
- AD1172360
Entities
People
- Bradford L Lott
Organizations
- Air Force Institute of Technology