Stochastic Optimization for Tropical Principal Component Analysis Over Tree Spaces

Abstract

A known challenge in the rapidly growing area of phylogenomics is the lack of tools to analyze the large volume of genome data. Genomic data includes information on the evolution, structure and mapping of genomes. Phylogenetic trees are branching diagrams that show the evolutionary history of species and their genes. Gene trees show the evolutionary history of a particular gene. To analyze evolutionary history from genomic data, we reduce the dimensionality of gene trees, overcoming high dimensional analytical challenges. Through the vectorization of pairwise distances between each combination of two leaves within a phylogenetic tree, we utilize a tropical principle component analysis: a principal component analysis(PCA) in terms of a tropical metric. We project gene trees onto a two-dimensional space using a tropical PCA, a tropical convex hull that minimizes the sum of residuals between each gene tree in the dataset and its projection onto the tropical convex hull over the tree space, which is the set of all possible gene trees. Since computing a tropical PCA for the given dataset is computationally time intensive, we implement a Markov Chain Monte Carlo Metropolis-Hastings algorithm to effectively and efficiently estimate the tropical PCA. .Utilizing simulation and real-world data, we implement our tropical PCA algorithm and visualize the results in two-dimensional plots, the results of which look promising and demonstrate our algorithm's strengths.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2019
Accession Number
AD1080360

Entities

People

  • Robert L. Page

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Algorithms
  • Cells
  • Computations
  • Deoxyribonucleic Acids
  • Dimensionality Reduction
  • Factor Analysis
  • Fish
  • Genetics
  • Geometry
  • Heuristic Methods
  • Information Science
  • Markov Chains
  • Mathematical Analysis
  • Monte Carlo Method
  • Probability
  • Simulations
  • Two Dimensional

Fields of Study

  • Biology

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Oncology and Biomarker-Based Cancer Detection.
  • Virology (or Medical Virology).

Technology Areas

  • Space