Estimation and Model Selection in Heterogeneous High-dimensional Settings with Random Forests

Abstract

The main goal of the proposal is to study the statistical properties of decision tree ensembles including Random Forests. In light of the empirical success of the Random Forest in supervised learning problems, especially for high dimensional data, the project focuses on the theoretical behavior of decision tree ensembles in terms of model selection properties. In particular, our goals are to leverage insights into Random Forests to develop novel methods that contribute to meaningful scientific discoveries. The project consists of several research thrusts. The first concentrates on developing a set of models of locally low-dimensional inhomogeneous functions and studying its minimax risk. Based on the proposed models, the second thrust emphasizes the statistical behavior of Random Forests, with a focus on feature importance scores, sample proximity measures, and model selection consistency. The third thrust is to design a novel decision tree ensemble, iterative Random Forest (iRF), that can achieve comparable prediction accuracy to Random Forests while producing more interpretable tree ensembles. Overall, our proposal addresses both the theoretical and practical problems related to tree ensembles, and in particular iRF. We will use techniques from high-dimensional statistics, non-asymptotic probability, and statistical learning theory to extract high order feature interactions.

Document Details

Document Type: DoD Grant Award
Publication Date: Feb 14, 2019
Source ID: W911NF1710005

Entities

People

Bin Yu

Organizations

Army Contracting Command
United States Army
University of California, Berkeley

Estimation and Model Selection in Heterogeneous High-dimensional Settings with Random Forests

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas