PIPA: A High-Throughput Pipeline for Protein Function Annotation

Abstract

Traditional experimental methods to determine the functions of proteins encoded in genomic sequences cannot keep pace with the avalanche of sequence data produced by new high-throughput sequencing technologies. This prompted the development of numerous bioinformatics approaches for automated protein function annotation. However, different function classification terminologies are frequently used by these different approaches, precluding the integration of multisource predictions. We developed Pipeline for Protein Annotation (PIPA), a genome-wide protein function annotation pipeline that runs in a high-performance computing environment. PIPA integrates different tools and employs the Gene Ontology (GO) to provide consistent annotation and resolve prediction conflicts. PIPA has three modules that allow for easy development of specialized databases and integration of various bioinformatics tools. The first module, the pipeline execution module, consists of programs that enable the user access to and control of the pipeline's parallel execution of multiple jobs, each searching a particular database for a chunk of the input data. The execution module wraps the second module, the core pipeline module. The integrated resources, the program for terminology conversion to GO, and the consensus annotation program constitute the main components of the core module. The third module is the preprocessing module. This last module contains the program for customized generation of protein function databases and the GO-mapping generation program, which creates GO mappings for the terminology conversion program. The current implementation of PIPA annotates protein functions by combining the results of an in-house-developed database for enzyme catalytic function prediction (CatFam) and the results of multiple integrated resources.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2008
Accession Number: ADA526660

Entities

People

Chenggang Yu
Jaques Reifman
Nela Zavaljevski
Valmik Desai

PIPA: A High-Throughput Pipeline for Protein Function Annotation

Abstract

Document Details

Entities

People

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers