A dataset comprised of binding interactions for 104,972 antibodies against a SARS-CoV-2 peptide

Abstract

The dataset presented here contains quantitative binding scores of scFv-format antibodies against a SARS-CoV-2 target peptide collected via an AlphaSeq assay that can be used in the development and benchmarking of machine learning models. Starting from three seed sequences identified from a phage display campaign using a human naïve library, four sets of 29,900 antibodies were designed in silico by creating all k = 1 mutations and random k = 2 and k = 3 mutations throughout the complementary-determining regions (CDRs). Of the 119,600 designs, 104,972 were successfully built in to the AlphaSeq library and target binding was subsequently measured with 71,384 designs resulting in a predicted affinity value for at least one of the triplicate measurements. Data include antibodies with predicted affinity measurements ranging from 37 pM to 22 mM. To our knowledge, this dataset is the largest, publicly available dataset that contains antibody sequences, antigen sequence and quantitative measurements of binding scores and provides an opportunity to serve as a benchmark to evaluate antibody-specific representation models for machine learning.

Document Details

Document Type: Pub Defense Publication
Publication Date: Oct 26, 2022
Source ID: 10.1038/s41597-022-01779-4

Entities

People

Charles Lin
Chelsea Lennartz
Daniel Guion
David Younger
Emily Engelhart
Leslie Shing
Mary Kelley
Matthew E. Walsh
Randolph Lopez
Ryan Emerson

Organizations

Defense Threat Reduction Agency

A dataset comprised of binding interactions for 104,972 antibodies against a SARS-CoV-2 peptide

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas