FIEFDom: A Transparent Domain Boundary Recognition System using a Fuzzy Mean Operator

Abstract

Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/ structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 04, 2008
Accession Number
ADA518413

Entities

People

  • Anders Wallqvist
  • Michael S. Lee
  • Rajkumar Bondugula

Organizations

  • United States Army Medical Research and Development Command

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Accuracy
  • Amino Acids
  • Application Software
  • Biomedical Research
  • Boundaries
  • Data Sets
  • Databases
  • High Performance Computing
  • Identities
  • Information Science
  • Machine Learning
  • Neural Networks
  • Nucleic Acids
  • Recognition
  • Supervised Machine Learning
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Fluid Dynamics (CFD)
  • Computer Vision.
  • Molecular Genetics