Estimation and Model Selection in Heterogeneous High-dimensional Settings with Random Forests
Abstract
The main goal of the proposal is to study the statistical properties of decision tree ensembles including Random Forests. In light of the empirical success of the Random Forest in supervised learning problems, especially for high dimensional data, the project focuses on the theoretical behavior of decision tree ensembles in terms of model selection properties. In particular, our goals are to leverage insights into Random Forests to develop novel methods that contribute to meaningful scientific discoveries. The project consists of several research thrusts. The first concentrates on developing a set of models of locally low-dimensional inhomogeneous functions and studying its minimax risk. Based on the proposed models, the second thrust emphasizes the statistical behavior of Random Forests, with a focus on feature importance scores, sample proximity measures, and model selection consistency. The third thrust is to design a novel decision tree ensemble, iterative Random Forest (iRF), that can achieve comparable prediction accuracy to Random Forests while producing more interpretable tree ensembles. Overall, our proposal addresses both the theoretical and practical problems related to tree ensembles, and in particular iRF. We will use techniques from high-dimensional statistics, non-asymptotic probability, and statistical learning theory to extract high order feature interactions.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 14, 2019
- Source ID
- W911NF1710005
Entities
People
- Bin Yu
Organizations
- Army Contracting Command
- United States Army
- University of California, Berkeley