Identifying Antimicrobial Peptides Using Word Embedding With Deep Recurrent Neural Networks

Abstract

Motivation: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences low complexity and high variance, which frustrates sequence similarity-based searches. Results: Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 10, 2018
Accession Number
AD1099673

Entities

People

  • Iddo Friedberg
  • Md-nafiz Hamid

Organizations

  • Iowa State University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Anti-Infective Agents
  • Artificial Intelligence Software
  • Computational Biology
  • Computational Science
  • Computer Languages
  • Databases
  • Hidden Markov Models
  • Information Science
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Neural Networks
  • Nucleic Acids
  • Ontologies
  • Probability
  • Recurrent Neural Networks
  • Supervised Machine Learning

Readers

  • Microbial Pathology
  • Molecular Genetics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML