Essays on Bioinformatics and Social Network Analysis: Statistical and Computational Methods for Complex Systems
Abstract
Although many may disagree or we may be taught the contrary, life is inherently non-independent. While in many situations it is safe and convenient to assume that our unit of analysis--whether it is genes or people--can be thought of as independent draws from a population, in many cases such an assumption cannot be made. This dissertation is about this issue--how can we deal with inherently complex and interconnected data--and furthermore, using modern computational tools, how to take advantage of this feature to obtain a better understanding of our world. In this document, I present two problems that, while very different, both exist in the realm of complex interconnected data: phylogenetics and social networks. Understanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene function is continuously growing, the number of genes with unknown biological purpose is far greater. Because of this, scientists have dedicated much of their time to building and designing tools that automatically infer gene function. In an effort to contribute to this task, I present a further attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to make a significant impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of so-called gene-families, comprising of thousands of genes, (b) it supports our biological intuition, in the sense that our models data-driven results coherently agree with existing beliefs regarding how gene-functions evolved, (c) the prediction accuracy of our model is comparable to other more complex alternatives, and (d) perhaps most importantly, our model can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies in the literature.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 02, 2020
- Accession Number
- AD1228960
Entities
People
- George G. Vega Yon
Organizations
- University of Southern California