Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity

Abstract

Using a benchmark Ames mutagenicity data set, we evaluated the performance of molecular fingerprints as descriptors for developing quantitative structure activity relationship (QSAR) models and defining applicability domains with two machine-learning methods: random forest (RF) and variable nearest neighbor (v-NN). The two methods focus on complementary aspects of chemical mutagenicity and use different characteristics of the molecular fingerprints to achieve high levels of prediction accuracies. Thus, while RF flags mutagenic compounds using the presence or absence of small molecular fragments akin to structural alerts, the v-NN method uses molecular structural similarity as measured by fingerprint-based Tanimoto distances between molecules. We showed that the extended connectivity fingerprints could intuitively be used to define and quantify an applicability domain for either method. The importance of using applicability domains in QSAR modeling cannot be understated; compounds that are outside the applicability domain do not have any close representative in the training set, and therefore, we cannot make reliable predictions. Using either approach, we developed highly robust models that rival the performance of a state-of-the-art proprietary software package. Importantly, based on the complementary approach used by the methods, we showed that by combining the model predictions we raised the applicability domain from roughly 80% to 90%. These results indicated that the proposed QSAR protocol constituted a highly robust chemical mutagenicity prediction model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 04, 2014
Accession Number
ADA602163

Entities

People

  • Anders Wallqvist
  • Ruifeng Liu

Organizations

  • Biotechnology High Performance Computing Software Applications Institute

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Accuracy
  • Application Software
  • Chemical Bonds
  • Data Sets
  • Experimental Data
  • Fingerprints
  • Information Science
  • Learning
  • Machine Learning
  • Molecules
  • Mutagens
  • Reliability
  • Small Molecules
  • Standards
  • Statistical Analysis
  • Toxicity
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Molecular Genetics
  • Toxicology/Environmental Toxicology

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks