On the sparsity of fitness functions and implications for learning

Abstract

The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments.

Document Details

Document Type: Pub Defense Publication
Publication Date: Dec 22, 2021
Source ID: 10.1073/pnas.2109649118

Entities

People

Amirali Aghazadeh
David H. Brookes
Jennifer Listgarten

On the sparsity of fitness functions and implications for learning

Abstract

Document Details

Entities

People

Tags

Fields of Study

Readers

Technology Areas