Molecular machine learning with conformer ensembles

Abstract

Virtual screening can accelerate drug discovery by identifying promising candidates for experimental evaluation. Machine learning is a powerful method for screening, as it can learn complex structure–property relationships from experimental data and make rapid predictions over virtual libraries. Molecules inherently exist as a three-dimensional ensemble and their biological action typically occurs through supramolecular recognition. However, most deep learning approaches to molecular property prediction use a 2D graph representation as input, and in some cases a single 3D conformation. Here we investigate how the 3D information of multiple conformers, traditionally known as 4D information in the cheminformatics community, can improve molecular property prediction in deep learning models. We introduce multiple deep learning models that expand upon key architectures such as ChemProp and SchNet, adding elements such as multiple-conformer inputs and conformer attention. We then benchmark the performance trade-offs of these models on 2D, 3D and 4D representations in the prediction of drug activity using a large training set of geometrically resolved molecules. The new architectures perform significantly better than 2D models, but their performance is often just as strong with a single conformer as with many. We also find that 4D deep learning models learn interpretable attention weights for each conformer.

Document Details

Document Type
Pub Defense Publication
Publication Date
Aug 24, 2023
Source ID
10.1088/2632-2153/acefa7

Entities

People

  • Rafael Gómez-Bombarelli
  • Simon Axelrod

Organizations

  • Defense Advanced Research Projects Agency

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Quantum Chemistry
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks