Constrained Generative Modeling for Autonomous Molecular Discovery
Abstract
The discovery of functional molecules and materials is the foundation of solutions to pressing societal challenges in sustainability, energy capture and storage, and manufacturing. Theoretically, designing novel molecules and materials can be thought of as a search in chemical space, comprising all possible arrangements of atoms. The typical discovery paradigm is an iterative process of designing candidate compounds, synthesizing those compounds, and testing their performance, where each repeat of this cycle can require weeks or months. The rate at which this process yields successful compounds can be limited by the quality of designed candidates and the ease of their synthesis and evaluation. Computational techniques can accelerate this process through the use of inexpensive property prediction models as surrogates for physical experiments or high-fidelity simulations. Classically, such surrogate models have been applied in a virtual screening framework by selecting optimal experiments from an enumerated list of candidates on the basis of predicted performance and/or uncertainty. However, this technique is insufficient to explore the vastness of chemical space: the estimated number of stable small molecules comprising only C, H, N, O atoms types is comparable to the number of atoms in the solar sying using deep neural networks (e.g., variational autoencoders, generative adversarial networks) have rapidly grown in popularity and have now been applied to the design of organic photovoltaic components, catalysts, alloys, porous materials, and pharmaceutical molecules, among others, in proof-of-concept studies. Yet, algorithmic techniques for generating molecules and materials exhibit significant limitationsthat have impeded their use in realistic discovery settings. The objectives of this proposal are to develop novel algorithms for molecular optimization using variational autoencoders to enable sample-efficient, multi-objective design of molecules with well-defined stereochemistry. Further, because generated candidates that are predicted to be high-performing must be validated experimentallyand be manufactured at larger scales if successfulit is essential to consider the actionability of computational hypotheses; this is acutely relevant in the context of autonomous laboratories where experiments must be planned and executed without human intervention. This proposal further aims to develop algorithmic strategies for constraining variational autoencoders by the synthesizability of candidate small molecules by generating and optimizing directed acyclic graphs that correspond to complete experimental protocols.The outcomes of the proposed work will be open-source algorithms and software for the automated optimization of molecular structures as demonstrated through application to various domains relevant to the Office of Naval Research. Specific applications we will examine include catalysts for depolymerization, conducting polymers, organic structure directing agents for zeolite synthesis, energy storage molecules, refrigerants, metal chelators, and catalysts for stereocontrolled polymerization. However, these algorithmic capabilities for computer-aided materials design are broadly applicable to the accelerated design and discovery of novel functional molecules and materials. They are expected to be especially impactful in purely computational workflows (i.e., wherecandidate performance can be evaluated through high-fidelity theoretical chemistry simulations) and autonomous experimental workflows (i.e., where candidates are synthesized and evaluated by low/high-throughput robotic platforms).
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 05, 2021
- Source ID
- N000142112195
Entities
People
- Connor W. Coley
Organizations
- Massachusetts Institute of Technology
- Office of Naval Research
- United States Navy