VkFFT  Vulkan Fast Fourier Transform library
VkFFT is an efficient GPUaccelerated multidimensional Fast Fourier Transform library for Vulkan projects. VkFFT aims to provide community with an opensource alternative to Nvidia's cuFFT library, while achieving better performance.
Currently supported features:
 1D/2D/3D systems
 Forward and inverse directions of FFT
 Maximum dimension size is 4096, 32bit float
 Radix2/4/8 FFT, only power of two systems
 All transformations are performed inplace with no performance loss
 Complex to complex (C2C), real to complex (R2C) and complex to real (C2R) transformations. R2C and C2R are optimized to run up to 2x times faster than C2C (2D and 3D case only)
 1x1, 2x2, 3x3 convolutions with symmetric or nonsymmetric kernel
 Headeronly (+precompiled shaders) library with Vulkan interface, which allows to append VkFFT directly to user's command buffer
Future release plan

Almost ready:
 Zero padding support
 Double and halfprecision arithmetics
 8192 and 16384 dimension sizes

Planned
 Publication based on implemented optimizations
 Mobile and integrated GPU support

Ambitious
 Multiple GPU job splitting
Installation
Include the vkFFT.h file and specify path to the shaders folder in CMake or from C interface. Sample CMakeLists.txt file configures project based on Vulkan_FFT.cpp file, which contains two examples on how to use VkFFT to perform FFT, iFFT and convolution calculations.
How to use VkFFT
VkFFT.h is a library which can append FFT, iFFT or convolution calculation to the user defined command buffer. It operates on storage buffers allocated by user and doesn't require any additional memory by itself. All computations are fully based on Vulkan compute shaders with no CPU usage except for FFT planning. VkFFT creates and optimizes memory layout by itself and performs FFT with the best chosen parameters. For an example application, see Vulkan_FFT.cpp file, which has comments explaining the VkFFT configuration process.
Picture below shows how data is restructured during the R2C transform depending on the system dimensions. This layout has minimal transfers between onchip memory and graphics card (one read and one write per FFT axis + transposition if axis dimension is ≥ 256). If convolution is performed, it is embedded into the last FFT axis, which reduces memory transfers even further.
Benchmark results in comparison to cuFFT
To measure how Vulkan FFT implementation works in comparison to cuFFT, we will perform a number of 2D and 3D tests. The test will consist of performing R2C FFT and inverse C2R FFT consecutively multiple times to calculate average time required. cuFFT uses outofplace configuration while VkFFT uses inplace. The results are obtained on Nvidia 1660 Ti graphics card with no other GPU load. Launching example 0 from Vulkan_FFT.cpp performs VkFFT benchmark, benchmark_cuFFT.cu file contains similar benchmark script for cuFFT library.
Contact information
Initial version of VkFFT is developed by Tolmachev Dmitrii.
Email: [email protected]