Statistical Methods for High-Dimensional Causal Inference

Abstract

Project Abstract Distinguishing between patterns of dependence that are causal rather than merely associative is a fundamental challenge faced in any data analysis. The traditional approach to making this distinction relies on the use of randomized experiments. However, there are many contexts where it is not possible to perform experiments because the variables of interest are not subject to direct manipulation due to practical, ethical or economic constraints. In some scenarios however, there may be another ‘instrumental’ variable that can be assigned randomly, such that in turn the instrument naturally influences the target variable. Such ‘imperfect’ experiments arise in many situations, for example where we wish to change attitudes or behavior. Though often not subject to direct experimental control, it often is possible to randomly assign subjects to be ‘encouraged’ (or otherwise incentivized) to change their behavior. Even if not all subjects do change their behavior we still may be able to draw valid causal inferences that can then form the basis for decision-making. There are other scenarios where either it may be impossible to perform experiments at all, or we may be faced with a high-dimensional complex system in which there are simply too many potential causal factors and responses for every pair to be considered in a designed experiment. The advent of ubiquitous data gathering procedures means that in such situations there may still be an abundance of high-quality observational data on which to base our inferences. This motivates the development of methods that permit causal and other easily interpretable inferences to be drawn from observational data. The proposed research aims to address fundamental inferential problems that arise in the causal interpretation and analysis of statistical data. Specifically we will perform research in three distinct, but related, areas: Development of fundamental statistical tools for analyzing the association between a binary treatment (or exposure) and a binary outcome using relative risks and risk differences; Methods for analyzing randomized experiments that are imperfect in the sense that the experimenter lacks direct control over the treatment or exposure of interest; Non-parametric approaches for inferring multivariate causal structure from observational data using generalizations of conditional independence. A primary emphasis of the proposed research is on developing methods that are practical from both a computational and statistical perspective. Computationally our goal is to build procedures that have clear convergence properties and are efficient in terms of their computational cost. In statistical terms our proposed approaches will either be fully non-parametric or, if that is not possible, make minimal parametric assumptions and be robust to mis-specification. In addition, if causal quantities of interest are not point identified, our proposed methods will be designed to distinguish the resulting uncertainty from that which arises solely due to sampling variability. This is important since the latter, but not the former, will disappear asymptotically. The tools developed by this research will facilitate the extraction of new causal knowledge from observational, experimental and quasi-experimental data in ways that are currently impossible. Thus this research addresses the fundamental challenge of ensuring that the analysis of ‘big data’ leads to ‘big insights’.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 12, 2016
Source ID
N000141512672

Entities

People

  • Thomas S Richardson

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Washington

Tags

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Economics
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference