Large-Scale Orthology Predictions for Inferring Gene Functions Across Multiple Species

Abstract

An effective approach to infer the functions of genes is to use the concept of gene orthology. Because orthologous genes are likely to share similar functions the functions of genes in an unstudied species can be inferred through the functions of their orthologs in a studied model species. To infer gene functions for a multitude of species, we developed a high-throughput orthology prediction method, termed PhyloTrace. PhyloTrace is both highly accurate and computationally efficient for large-scale applications, having the ability to infer orthologous genes across thousands of species. This is accomplished through three major steps: 1) allagainst- all gene comparisons for every pair of genes, 2) pair-wise orthology predictions for every two genomes and 3) the generation of orthologous clusters that contain orthologous genes across multiple genomes. We employed the previously developed Pipeman parallelization program to break down a set of millions of input sequences into small chunks and then processed them in parallel. We successfully predicted orthologs for over 900 bacterial genomes, achieving a falsepositive prediction rate of 2.0%, which was a significant improvement compared with the widely used bidirectional best-hit method, which yielded a falsepositive rate of 5.5%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2010
Accession Number
ADA548931

Entities

People

  • Chenggang Yu
  • Jaques Reifman
  • Nela Zavaljevski
  • Valmik Desai

Organizations

  • United States Army Medical Research and Development Command

Tags

DTIC Thesaurus Topics

  • Application Software
  • Biomedical Research
  • Biotechnology
  • Clustering
  • Computers
  • Databases
  • Demographic Cohorts
  • Department Of Defense
  • Dna Sequence Analysis
  • Genetics
  • Graphical User Interface
  • High Performance Computing
  • Infectious Diseases
  • Prokaryotes
  • Sequences
  • Throughput
  • User Interface

Fields of Study

  • Biology

Readers

  • Computational Modeling and Simulation
  • Molecular Genetics

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks