Causal Inference from Big Data: Theoretical Foundations and the Data-fusion Problem

Abstract

We review concepts, principles, and tools that unify current approaches to causal analysis, and attend to new challenges presented by big data. In particular, we address the problem of data-fusion-- piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) so as to obtain valid answers to queries of interest. The avail- ability of multiple heterogeneous datasets presents new opportunities, since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted models. We here present a general, non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data-fusion in causal and counterfactual inference.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2015
Accession Number
ADA623167

Entities

People

  • Elias Bareinboim
  • Judea Pearl

Organizations

  • University of California, Los Angeles

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Big Data
  • Clinical Trials
  • Computer Science
  • Data Analysis
  • Data Fusion
  • Educational Psychology
  • Genetics
  • Information Processing
  • Information Science
  • Information Systems
  • Language
  • Probability
  • Psychology
  • Reasoning
  • Sampling
  • Statistics

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Distributed Systems and Data Platform Development
  • Regression Analysis.

Technology Areas

  • AI & ML