Hardware accelerated Cooley-Tukey FFT implementation for the Xtensa architecture.
Image by Virens, licensed under the Creative Commons Attribution 3.0
Unported license.
This is an implementation of the Cooley-Tukey FFT algorithm for one of
Tensilica’s Xtensa processor platforms. The Tensilica Instruction Extension
hardware-description DSL is utilized to customize the core Xtensa core
architecture by means of additional specialized registers and instructions
which make it possible to perform the specialized task of computing a DIT/DIF
FFT at significantly higher speeds than would ordinarily be possible with the
base ISA alone. Hardware acceleration results in a > x25 speedup for moderately
large inputs:
See the report under submodules/report/pdf
for a detailed
description of the design and implementation of this project as well as
performance and chip area analyses.
The algorithm itself is implemented in assembly in project/source/asm/fft.S
,
the instruction set extensions are implemented in project/source/tie/fft.tie
in the aforementioned TIE language. In order to actually simulate this design
you would need a commercial licence for the Xtensa SDK.