GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics,

Abstract

High‐throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele‐calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus‐identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels.

Document Details

Document Type
Pub Defense Publication
Publication Date
Apr 16, 2018
Source ID
10.1111/1556-4029.13794

Entities

People

  • Adam Michaleas
  • Anna Shcherbina
  • Darrell O Ricke
  • Philip Fremont‐smith

Organizations

  • Massachusetts Institute of Technology
  • United States Air Force

Tags

Fields of Study

  • Biology

Readers

  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.
  • Women's Health and Cancer Risk Research: African American Women and Pregnancy Outcomes.