On the sparsity of fitness functions and implications for learning

Abstract

The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments.

Document Details

Document Type
Pub Defense Publication
Publication Date
Dec 22, 2021
Source ID
10.1073/pnas.2109649118

Entities

People

  • Amirali Aghazadeh
  • David H. Brookes
  • Jennifer Listgarten

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Molecular and Cellular Biochemistry
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks
  • Biotechnology