A dataset comprised of binding interactions for 104,972 antibodies against a SARS-CoV-2 peptide

Abstract

The dataset presented here contains quantitative binding scores of scFv-format antibodies against a SARS-CoV-2 target peptide collected via an AlphaSeq assay that can be used in the development and benchmarking of machine learning models. Starting from three seed sequences identified from a phage display campaign using a human naïve library, four sets of 29,900 antibodies were designed in silico by creating all k = 1 mutations and random k = 2 and k = 3 mutations throughout the complementary-determining regions (CDRs). Of the 119,600 designs, 104,972 were successfully built in to the AlphaSeq library and target binding was subsequently measured with 71,384 designs resulting in a predicted affinity value for at least one of the triplicate measurements. Data include antibodies with predicted affinity measurements ranging from 37 pM to 22 mM. To our knowledge, this dataset is the largest, publicly available dataset that contains antibody sequences, antigen sequence and quantitative measurements of binding scores and provides an opportunity to serve as a benchmark to evaluate antibody-specific representation models for machine learning.

Document Details

Document Type
Pub Defense Publication
Publication Date
Oct 26, 2022
Source ID
10.1038/s41597-022-01779-4

Entities

People

  • Charles Lin
  • Chelsea Lennartz
  • Daniel Guion
  • David Younger
  • Emily Engelhart
  • Leslie Shing
  • Mary Kelley
  • Matthew E. Walsh
  • Randolph Lopez
  • Ryan Emerson

Organizations

  • Defense Threat Reduction Agency

Tags

Fields of Study

  • Biology

Readers

  • Computational Modeling and Simulation
  • Immunology
  • Molecular Genetics

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks