High Dimensional Causal Model Search

Abstract

A basic task in any multivariate data analysis is to understand the patterns of dependence and independence that are present between the variables measured. In particular, it is important to distinguish between the situation where two observed quantities are simply independent (called ~marginal independence~), from that in which they are independent within levels of a third variable (called ~conditional independence~). The ability to distinguish between these two phenomena is a central task in many causal structure-learning algorithms.However, standard approaches for performing statistical tests cannot be readily applied to distinguish these forms of independence. This is because it is possible for both patterns of independence to hold simultaneously. In technical terms, the two hypotheses are not ~nested~, whereas most standard hypothesis-testing approaches, e.g. Likelihood Ratio Tests apply to nested hypotheses.In the absence of a statistical theory for carrying out such tests, ad hoc approaches have been developed that lack statistical performance guarantees. We propose to address this gap by developing new methods that allow marginal and conditional independence to be distinguished. A second project involves developing methods that allow machine learning techniques to be used to adjust for confounding factors when making causal inferences from observational data. Traditionally, machine learning methods have focused on prediction problems, where performance may be judged by comparing predictions against a test set, or via cross-validation. In causal inferenceproblems such a yard-stick does not typically exist, so these methods cannot be applied directly. However, machine-learning methods can be used to adjust for the presence of observed confounding variables, providing a more flexible approach to removing such biases in causal estimates. Though the use of these machine learning techniques is motivated by reducing bias, therehave not been methods for quantifying the size of any bias that remains. As a practical consequence, it is possible that the confidence intervals for causal effects resulting from the application of such machine learning methods may have incorrect coverage. We propose to develop general-purpose techniques for detecting bias in state-of-the-art machine-learning based causal estimators.In addition, when such bias is detected we propose to modify the estimator so as to reduce the size of the remaining bias. A third line of research involves investigating newly discovered forms of sparsity that arise in connection with latent variable models. These new forms of sparsity represent generalizations ofmarginal and conditional independence that are central to most existing sparse representations. We propose to develop a general non-parametric theory describing these constraints. We also propose to develop learning algorithms that will recover these constraints from finite data samples, and allow inference to be made concerning the underlying causal structure. These procedures willapply generalizations of the non-nested hypothesis tests developed under our first aim. A primary emphasis of the proposed research is on developing new methods that are practical from both a computational and statistical perspective. We propose to construct procedures that are computationally efficient, but at the same time have well-understood statistical properties. Our models will either be fully non-parametric, or, if that is not feasible, will make minimal parametricassumptions and be robust to mis-specification. In summary, the proposed research aims to develop theory and methods for high-dimensional causal data analysis, thereby allowing causal knowledge to be extracted from observational data.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 15, 2019
Source ID
N000141912446

Entities

People

  • Thomas S Richardson

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Washington

Tags

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Neural Networks