Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

Abstract

The Black-Box Adaptive Fast Multipole Method (bbAFMM) has been generating some interest within the high-performance computing community as a tractable solution to the well-known n-body problem. The bbAFMM approximates the n-body solution using a series of independent functions or kernels that are attractive to high-performance code development using one or more graphics processor unit (GPU) devices. This work follows the analysis and implementation of the direct interaction called particle-to-particle kernel for a shared-memory single GPU device using the Compute Unified Device Architecture, revealing a performance boost of greater than 500 times over the corresponding serial central processing unit implementation. The objective of this work is to both document the implementation of the GPU kernel and provide a better understanding of the observed performance through an algorithmic analysis that focuses on arithmetic intensity, GPU memory bandwidth, GPU peak performance, and the defined Peripheral Component Interconnect Express bandwidth.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 2015
Accession Number
ADA625090

Entities

People

  • AmirHossein Aminfar
  • Dale Shires
  • Eric F Darve
  • Mohammad P. Ansari
  • Richard H. Haney
  • Rohit Pataki

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Arithmetic
  • Bandwidth
  • Central Processing Units
  • Data Transmission
  • Floating Point Operations
  • Graphics
  • High Performance Computing
  • Intensity
  • Mechanical Engineering
  • Military Research
  • N Body Problem
  • Near Field
  • Particles

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Parallel and Distributed Computing.