A transpose-free in-place SIMD optimized FFT

Abstract

A transpose-free in-place SIMD optimized algorithm for the computation of large FFTs is introduced and implemented on the Cell Broadband Engine. Six different FFT implementations of the algorithm using six different data movement methods are described. Their relative performance is compared for input sizes from 2 17 to 2 21 complex floating point samples. Large differences in performance are observed among even theoretically equivalent data movement patterns. All six implementations compare favorably with FFTW and other previous FFT implementations.

Document Details

Document Type
Pub Defense Publication
Publication Date
Sep 01, 2012
Source ID
10.1145/2355585.2355596

Entities

People

  • James R. Geraci
  • Sharon M. Sacco

Organizations

  • MIT Lincoln Laboratory
  • United States Air Force

Tags

Readers

  • Approximation Theory.
  • Parallel and Distributed Computing.