Statistical Methods for Studying Genetic Variation in Populations

Abstract

The study of genetic variation in populations is of great interest for the study of the evolutionary history of humans and other species. Improvement in sequencing technology has resulted in the availability of many large datasets of genetic data. Computational methods have therefore become quite important in analyzing these data. Two important problems that have been studied using genetic data are population stratification (modeling individual ancestry with respect to ancestral populations) and genetic association (finding genetic polymorphisms that affect a trait). In this thesis, we develop methods to improve our understanding of these two problems. For the population stratification problem, we develop hierarchical Bayesian models that incorporate the evolutionary processes that are known to affect genetic variation. By developing mStruct, we show that modeling more evolutionary processes improves the accuracy of the recovered population structure. We demonstrate how nonparametric Bayesian processes can be used to address the question of choosing the optimal number of ancestral populations that describe the genetic diversity of a given sample of individuals. We also examine how sampling bias in genotyping study design can affect results of population structure analysis and propose a probabilistic framework for modeling and correcting sample selection bias. Genome-wide association studies (GWAS) have vastly improved our understanding of many diseases. However, such studies have failed to uncover much of the variation responsible for a number of common multi-factorial diseases and complex traits. We show how artificial selection experiments on model organisms can be used to better understand the nature of genetic associations. We demonstrate using simulations that using data from artificial selection experiments improves the performance of conventional methods of performing association.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2012
Accession Number
ADA568143

Entities

People

  • Suyash Shringarpure

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Biomedical
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Bayesian Networks
  • Biological Sciences
  • Birds
  • Computational Science
  • Data Analysis
  • Data Mining
  • Genetic Structures
  • Genetic Variation
  • Genetics
  • Geography
  • Information Processing
  • Information Science
  • Machine Learning
  • Medical Genetics
  • Monte Carlo Method
  • Probabilistic Models
  • Probability Distributions

Fields of Study

  • Biology

Readers

  • Molecular and genetic basis of cancer.
  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • Biotechnology