Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data

Abstract

Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30–75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.

Document Details

Document Type
Pub Defense Publication
Publication Date
Oct 13, 2022
Source ID
10.3390/ncrna8050070

Entities

People

  • Chan Zhou
  • Euijin Kwon
  • Katherine A. Fitzgerald
  • Peng Zhou
  • Zhiping Weng
  • Zixiu Li

Organizations

  • Defense Advanced Research Projects Agency
  • National Institutes of Health

Tags

Fields of Study

  • Biology

Readers

  • Distributed Systems and Data Platform Development
  • Molecular Genetics
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks