A Semi-Automatic Pipeline for Efficient and Sustained Polymer Data Capture

Abstract

The primary objective of this project will be the creation of a comprehensive and dynamical (i.e., continuously and automatically evolving/growing) polymer database. The Materials Genome Initiative has a growing number of applications, the materials innovation cycle has been greatly accelerated as a result of insights provided by data-driven materials informatics platforms. High-throughput computational methodologies, data descriptors and machine learning are playing an increasingly invaluable role in research development portfolios across both academia and industry. Polymers, especially, have long suffered from a lack of integrated data on electronic, mechanical, thermal, dielectric, transport, rheological, biodegradable, etc., properties across large chemical spaces; available data is scattered in repositories with limited content, handbooks that are outdated, and the continuously growing open literature which tends to be heterogeneous, defying painstaking manual data retrieval. Creation of an efficient and (semi-)automatic pipeline for polymer data capture from all available sources in a continuous and sustainable manner is urgently needed. Such a capability will lead to a number of obvious advantages and developments that can accelerate polymer discovery, development, optimizationand deployment for a number of DOD and civilian applications. These benefits include:(1) An easy-to-maintain up-to-date polymer database that can be directly queried and searched (e.g., during materials selection for a particular application);(2) Mining of the data can lead to insights on correlations between properties and the limits of property ranges;(3) Surrogate (machine learning) models may be built based on the data for the rapid prediction of theproperties of polymers not already in the database;(4) Strategies may be created for the direct design of polymers meeting a set of target property requirements using one of many emerging machine learning algorithms trained on the available data.Progress has been made to a limited extent on items (2)-(4) above, but these developments will be permanently constrained by the dataset on which all these aspects are built on. Hence, this proposal, whose primary goal is the creation of an efficient and (semi-)automatic pipeline for polymer data capture in a continuous and sustainable manner. Item (1) above will be the primarydeliverable, which will spawn (2)-(4) as secondary outcomes. Enhanced machine learning models to predict the properties of new polymers and techniques to solve the ~inverse~ problem (i.e., design of polymers meeting a target property requirement) will be developed utilizing this database, thus providing a pathway for converting the data to knowledge.

Document Details

Document Type
DoD Grant Award
Publication Date
Apr 24, 2019
Source ID
N000141912103

Entities

People

  • Ramamurthy Ramprasad

Organizations

  • Georgia Tech Research Corporation
  • Office of Naval Research
  • United States Navy

Tags

Readers

  • Distributed Systems and Data Platform Development
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Microelectronics
  • Space