The Relationship between Protein Structure and Function: a Comprehensive Survey with Application to the Yeast Genome

Abstract

For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically investigating the relationship between protein function and structure. We focus initially on enzymes classified by the Enzyme Commission (EC) and relate these to structurally classified proteins in the SCOP database. We find that the major SCOP fold classes have different propensities to carry out certain broad categories of functions. For instance alpha/beta folds are disproportionately associated with enzymes, especially transferases and hydrolases, and all-alpha and small folds with non-enzymes, while alpha+beta folds have an equal tendency either way. These observations for the database overall are largely true for specific genomes. We focus, in particular, on yeast, analyzing it with many classifications in addition to SCOP and EC 9i.e. COGs, CATH, MIPS), and find clear tendencies for fold-function association, across a broad spectrum of functions. Analysis with the COGs scheme also suggests that the functions of the most ancient proteins are more evenly distributed among different structural classes than those of more modern ones. For the database overall, we identify both most versatile functions, i.e. those that are associated with the most folds, and most versatile folds, associated with the most functions. The two most versatile enzymatic functions (hydro-lvases and O-glycosyl glucosidases) are associated with 7 folds each. The five most versatile folds (TIM-barrel, Rossmann, ferredoxin, alpha-beta hydrolase, and P-loop NTP hydrolase) are all mixed alpha-beta structures. They stand out as generic scaffolds, accommodating from 6 to as many as 16 functions (for the exceptional TIM-barrel).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1999
Accession Number
ADA472211

Entities

People

  • Hedi Hegyi
  • Mark Gerstein

Organizations

  • Yale University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Albumins
  • Bacterial Proteins
  • Biomedical And Dental Materials
  • Chemical Synthesis
  • Chemistry
  • Classification
  • Databases
  • Enzymes
  • Fish
  • Hydrolases
  • Microbial Genome
  • Microbiology
  • Molecular Biology
  • Polymer Chemistry
  • Polymeric Films
  • Proteins
  • Proteomics

Fields of Study

  • Biology

Readers

  • Marine Ecological Systems Migration
  • Molecular and Cellular Biochemistry
  • Systems Analysis and Design