Discovering and deciphering relationships across disparate data modalities

Abstract

Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jan 15, 2019
Source ID
10.7554/elife.41690

Entities

People

  • Carey E. Priebe
  • Cencheng Shen
  • Eric W. Bridgeford
  • Joshua T Vogelstein
  • Mauro Maggioni
  • Qing Wang

Organizations

  • Air Force Office of Scientific Research
  • Child Mind Institute
  • Defense Advanced Research Projects Agency
  • Johns Hopkins University
  • National Science Foundation
  • Office of Naval Research
  • University of Delaware

Tags

Readers

  • Military Science and Technology Research and Modernization.
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • Biotechnology