Methods for Causal Inference and Validation in Large and Small Data
Abstract
Abstract The rise of massive datasets that provide fine?grained information about human beings and their behavior provides unprecedented opportunities for evaluating the effectiveness of treatments across the human sciences. New methodological challenges have to be overcome to make the most of these opportunities. Among them are algorithms that do not scale, questions about how to combine experimental studies with massive observational data, and how to estimate heterogeneous treatment effects for subgroups. A problem that connects these challenges is the explosion of false positives that has come along with massive data. There is a profound need for new methods of validation and for validation to play a central role in uncertainty quantification. The size of randomized experiments has not grown as fast as the size of non?random observational databases. This is the case even at large internet firms such as Facebook and Google. We have massive amounts of data on treatments that have not been randomly assigned and smaller and less frequent data from credible randomized trials. The old tradeoff between external and internal validity has become even sharper. When we do run large experiments, many experimental designs are difficult or impossible to implement because the underlying algorithms do not scale. We have become more interested in fine?grained inferences. Researchers and policy makers are increasingly unsatisfied with estimates of average treatment effects based on experimental samples that are unrepresentative of populations of interest. Instead, they seek to target treatments to particular populations and subgroups. In medicine, this is called personalized or precision medicine, but the push for finer?grained inferences is occurring across fields. These fine?grained inferences lead to small data problems: subgroups where the dimensionality of data is high but the number of observations is small. Paralleling these trends, in the past few years, causal inference has become an exploding multidisciplinary field with contributions from statistics, computer science, and application domains including the biomedical, natural, and social sciences. New theories, methods, and algorithms are being developed, and it has become easier to obtain data to validate and test our methods. I propose to develop a framework for large scale social experiments. This includes developing new theories and methods and creating open source software that makes new methods accessible for researchers. I propose to work on three problems: (1) combining both experimental and observational studies to leverage the strength of each and thereby making it possible to draw credible inferences about the populations of true interest; (2) creating new polynomial or linear time algorithms for designing largescale experiments; and (3) developing methods and theories for analyzing largescale experiments. To test the framework, and to develop a software platform that makes analysis simple and easy for researchers, I propose a large scale experiment in Section 4. I have already done theoretical, computational, and applied work in all three areas; the time is right to make further important progress in each.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141512367
Entities
People
- Jasjeet Sekhon
Organizations
- Office of Naval Research
- United States Navy
- University of California Regents