Can Computationally Designed Protein Sequences Improve Secondary Structure Prediction?

Abstract

Computational sequence design methods are used to engineer proteins with desired properties such as increased thermal stability and novel function. In addition, these algorithms can be used to identify an envelope of sequences that may be compatible with a particular protein fold topology. In this regard, we hypothesized that sequence-property prediction, specifically secondary structure, could be significantly enhanced by using a large database of computationally designed sequences. We performed a large-scale test of this hypothesis with 6511 diverse protein domains and 50 designed sequences per domain. After analysis of the inherent accuracy of the designed sequences database, we realized that it was necessary to put constraints on what fraction of the native sequence should be allowed to change. With mutational constraints, accuracy was improved vs. no constraints, but the diversity of designed sequences, and hence effective size of the database, was moderately reduced. Overall, the best three-state prediction accuracy (Q3) that we achieved was nearly a percentage point improved over using a natural sequence database alone, well below the theoretical possibility for improvement of 8-10 percentage points. Furthermore,our nascent method was used to augment the state-of-the-art PSIPRED program by a percentage point.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2011
Accession Number
ADA546384

Entities

People

  • Anders Wallqvist
  • Michael S. Lee
  • Rajkumar Bondugula

Organizations

  • United States Army Medical Research and Development Command

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Amino Acids
  • Application Software
  • Biomedical Research
  • Computational Complexity
  • Databases
  • Engineering
  • Engineers
  • High Performance Computing
  • Information Science
  • Molecular Dynamics
  • Neural Networks
  • Nucleic Acids
  • Protein Engineering
  • Thermal Stability
  • Topology

Readers

  • Computational Modeling and Simulation
  • Molecular Genetics