Disease Modeling via Large-Scale Network Analysis
Abstract
A central goal of genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit problem of predicting the association of genes with phenotypes or traits. Our primary goal is to develop pragmatic data analytic methods for linking specific genes to traits and diseases, especially polygenic traits, which are the most challenging. We are also interested in developing theoretical guarantees for the methods. In the past, we have developed predictive methods general enough to apply to potentially any genetic trait, varying from plant traits relevant to desirable agricultural properties to important human diseases. Our methods, Katz on heterogeneous network and CATAPULT[1], for predicting gene-disease associations were published during the last project period in the PLOS One journal. The biological problem has also led us to pursue a significant problem in machine learning. One of the fundamental questions in machine learning relating to the classification problem is if we can efficiently learn classifiers that can provably achieve low misclassification rates in the presence of certain type of random label noise in the training data.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 20, 2015
- Accession Number
- ADA625870
Entities
People
- Edward Marcotte
- Inderjit S. Dhillon
Organizations
- University of Texas at Austin