A number of algorithms have columns of operators that can be applied in parallel. The FFT is one such example. The operator is commonly called a `butterfly'. The operator composes two operands (which are generally each complex numbers) and delivers two results. The successive passes can be done in place on one array whose values are passed by reference to the butterfly code in the software version.
Our diagram shows a 16-point FFT, but typically applications use hundreds or thousands.
(The code fragment shows a single-threaded implementation that does not attempt to put butterflies in parallel.)
The pattern of data movement is known as a shuffle. It is also used in switching and sorting algorithms.
The downside of shuffle computations for acceleration is that there is no packing of data (structural partitioning of the data array) into spatially-separate memories that works well for all of the passes: what is good at the start is poor at the end. FFT details are not examinable (within this course at least).
30: (C) 2008-18, DJ Greaves, University of Cambridge, Computer Laboratory. |