Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes

Abstract

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.

Document Details

Document Type
Pub Defense Publication
Publication Date
Feb 24, 2022
Source ID
10.3390/biology11030360

Entities

People

  • Bhagya Shree Kottoori
  • Kaiyue Zhou
  • Seeya Awadhut Munj
  • Sorin Draghici
  • Suzan Arslanturk
  • Zhuomin M. Zhang

Organizations

  • National Science Foundation
  • United States Department of Defense

Tags

Readers

  • Neural Network Machine Learning.
  • Oncology
  • Oncology and Biomarker-Based Cancer Detection.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks