On the sparsity of fitness functions and implications for learning
Abstract
The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Dec 22, 2021
- Source ID
- 10.1073/pnas.2109649118
Entities
People
- Amirali Aghazadeh
- David H. Brookes
- Jennifer Listgarten