Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes

Abstract

Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.

Document Details

Document Type: Pub Defense Publication
Publication Date: Feb 24, 2022
Source ID: 10.3390/biology11030360

Entities

People

Bhagya Shree Kottoori
Kaiyue Zhou
Seeya Awadhut Munj
Sorin Draghici
Suzan Arslanturk
Zhuomin M. Zhang

Organizations

National Science Foundation
United States Department of Defense

Integration of Multimodal Data from Disparate Sources for Identifying Disease Subtypes

Abstract

Document Details

Entities

People

Organizations

Tags

Readers

Technology Areas