Inferring the shape of global epistasis

Abstract

How does an organism’s genetic sequence govern its measurable characteristics? New technologies provide libraries of randomized sequences to study this relationship in unprecedented detail for proteins and other molecules. Deriving insight from these data is difficult, though, because the space of possible sequences is enormous, so even the largest experiments sample a tiny minority of sequences. Moreover, the effects of mutations may combine in unexpected ways. We present a statistical framework to analyze such mutagenesis data. The key assumption is that mutations contribute in a simple way to some unobserved trait, which is related to the observed trait by a nonlinear mapping. Analyzing three proteins, we show that this model is easily interpretable and yet fits the data remarkably well.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jul 23, 2018
Source ID
10.1073/pnas.1804015115

Entities

People

  • David M. Mccandlish
  • Jakub Otwinowski
  • Joshua B. Plotkin

Organizations

  • Army Research Office
  • National Institutes of Health
  • University of Pennsylvania

Tags

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Molecular Genetics
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • Biotechnology
  • Space