Text-mined dataset of inorganic materials synthesis recipes

Abstract

Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.

Document Details

Document Type
Pub Defense Publication
Publication Date
Oct 15, 2019
Source ID
10.1038/s41597-019-0224-1

Entities

People

  • Gerbrand Ceder
  • Haoyan Huo
  • Olga Kononova
  • Tanjin He
  • Tiago Botari
  • Vahe Tshitoyan
  • Wenhao Sun
  • Ziqin Rong

Organizations

  • National Science Foundation
  • Office of Naval Research

Tags

Fields of Study

  • Chemistry

Readers

  • Computational Linguistics
  • Nanocomposite Materials Science
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation