Large-scale protein function prediction using heterogeneous ensembles

Abstract

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).

Document Details

Document Type
Pub Defense Publication
Publication Date
Sep 28, 2018
Source ID
10.12688/f1000research.16415.1

Entities

People

  • Gaurav Pandey
  • Jeffrey N Law
  • Linhua Wang
  • Shiv D. Kale
  • T M Murali

Organizations

  • Intelligence Advanced Research Projects Activity
  • International Business Machines Corporation (Armonk, NY)
  • National Institutes of Health

Tags

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.