FAST PARTIAL FOURIER TRANSFORM

Abstract

Given a time-series vector, how can we efficiently compute a specified part of Fourier coefficients? Fast Fourier transform (FFT) is a widely used algorithm that computes the discrete Fourier transform in many machine learning applications. Despite the pervasive use, FFT algorithms do not provide a fine-tuning option for the user to specify one's demand, that is, the output size (the number of Fourier coefficients to be computed) is algorithmically determined by the input size. Such a lack of flexibility is often followed by just discarding the unused coefficients because many applications do not require the whole spectrum of the frequency domain, resulting in an inefficiency due to the extra computation. In this paper, we propose a fast Partial Fourier Transform (PFT), an efficient algorithm for computing only a part of Fourier coefficients. PFT approximates a part of twiddle factors (trigonometric constants) using polynomials, thereby reducing the computational complexity due to the mixture of many twiddle factors. We derive the asymptotic time complexity of PFT with respect to input and output sizes, as well as its numerical accuracy. Experimental results show that PFT outperforms the current state-of-the-art algorithms, with an order of magnitude of speedup for sufficiently small output sizes without sacrificing accuracy.

1. INTRODUCTION

How can we efficiently compute a specified part of Fourier coefficients of a given time-series vector? Discrete Fourier transform (DFT) is a crucial task in several application areas, including anomaly detection (Hou & Zhang (2007) ; Rasheed et al. (2009) ; Ren et al. (2019) ), data center monitoring (Mueen et al. (2010) ), and image processing (Shi et al. (2017) ). Notably, in many such applications, it is well known that the DFT results in strong "energy-compaction" or "sparsity" in the frequency domain. That is, the Fourier coefficients of data are mostly small or equal to zero, having a much smaller support compared to the input size. Moreover, the support can often be specified in practice (e.g., a few low-frequency coefficients around the origin). These observations arouse a great interest in an efficient algorithm capable of computing only a specified part of Fourier coefficients. Accordingly, various approaches have been proposed to address the problem, which include Goertzel algorithm (Burrus & Parks (1985) ), Subband DFT (Hossen et al. (1995); Shentov et al. (1995) ), and Pruned FFT (Markel (1971); Skinner (1976); Nagai (1986) ; Sorensen & Burrus (1993) ; Ailon & Liberty ( 2009)). In this paper, we propose a fast Partial Fourier Transform (PFT), an efficient algorithm for computing a part of Fourier coefficients. Specifically, we consider the following problem: given a complex-valued vector a of size N , a non-negative integer M , and an integer µ, estimate the Fourier coefficients of a for the interval [µ -M, µ + M ]. The resulting algorithm is of remarkably simple structure, composed of several "smaller" FFTs combined with linear pre-and post-processing steps. Consequently, PFT reduces the number of operations to O(N + M log M ), which is, to the best of our knowledge, the lowest arithmetic complexity achieved so far. Besides that, most subroutines of PFT are the already highly optimized algorithms (e.g., matrix multiplication and FFT), thus the arithmetic gains are readily turned into actual run-time improvements. Furthermore, PFT does not require the input size to be a power of 2, unlike many other competitors. This is because the idea of PFT derives from a modification of the Cooley-Tukey algorithm (Cooley & Tukey, 1965) , which also makes it straightforward to extend the idea to a higher dimensionality. Through experiments, we show that PFT outperforms the state-of-the-art FFT libraries, FFTW by Frigo & Johnson (2005) and Intel Math Kernel Library (MKL) as well as Pruned FFTW, with an order of magnitude of speedup without sacrificing accuracy. We describe various existing methods for computing partial Fourier coefficients. Fast Fourier transform. One may consider just using Fast Fourier transform (FFT) and discarding the unnecessary coefficients, where FFT efficiently computes the full DFT, reducing the arithmetic cost from naïve O(N 2 ) to O(N log N ). Such an approach has two major advantages: (1) it is straightforward to implement, and (2) the method often outperforms the competitors because it directly employs FFT which has been highly optimized over decades. Therefore, we provide extensive comparisons of PFT and FFT both theoretically and through run-time evaluations. Experimental results in Section 4.2 show that PFT outperforms the FFT when the output size is small enough (< 10%) compared to the input size. Goertzel algorithm. Goertzel algorithm (Burrus & Parks (1985) ) is one of the first methods devised for computing only a part of Fourier coefficients. The technique is essentially the same as computing the individual coefficients of DFT, thus requiring O(M N ) operations for M coefficients of an input of size N . Specifically, theoretical analysis represents "the M at which the Goertzel algorithm is advantageous over FFT" as M < 2 log N (Sysel & Rajmic (2012)). For example, with N = 2 22 , the Goertzel algorithm becomes faster than FFT only when M < 44, while PFT outperforms FFT for M < 2 19 = 524288 (Figure 1b ). A few variants which improve the Goertzel algorithm have been proposed (e.g., Boncelet (1986) ). Nevertheless, the performance gain is only by a small constant factor, thus they are still limited to rare scenarios where a very few number of coefficients are required. Subband DFT. Subband DFT (Hossen et al. (1995); Shentov et al. (1995) ) consists of two stages of algorithm: Hadamard transform that decomposes the input sequence into a set of smaller subsequences, and correction stage for recombination. The algorithm approximates a part of coefficients by eliminating subsequences with small energy contribution, and manages to reduce the number of operations to O(N + M log N ). Apart from the arithmetic gain, however, there is a substantial issue of low accuracy with the Subband DFT. Indeed, experimental results in Hossen et al. (1995) show that the relative approximation error of the method is around 10 -1 (only one significant figure) regardless of output size. Moreover, the Fourier coefficients can be evaluated to arbitrary numerical precision with PFT, which is not the case for Subband DFT. Such limitations often preclude one from considering the Subband DFT in applications that require a certain degree of accuracy. Pruned FFT. FFT pruning (Markel (1971) ; Skinner (1976); Nagai (1986); Sorensen & Burrus (1993); Ailon & Liberty ( 2009)) is another technique for the efficient computation of partial Fourier coefficients. The method is a modification of the standard split-radix FFT, where the edges (operations) in a flow graph are pruned away if they do not affect the specified range of frequency domain. Besides being almost optimized (it uses FFT as a subroutine), the FFT pruning algorithm is exact and reduces the arithmetic cost to O(N log M ). Thus, along with the full FFT, the pruned FFT is reasonably the most appropriate competitor of PFT. Through experiments (Section 4.2), we show that PFT consistently outperforms the pruned FFT, significantly extending the range of output sizes for which partial Fourier transform becomes practical. Finally, we mention that there have been other approaches but with different settings. For example, Hassanieh et al. (2012a; b) and Indyk et al. (2014) propose Sparse Fourier transform, which estimates the top-k (the k largest in magnitude) Fourier coefficients of a given vector. The algorithm is useful especially when there is prior knowledge of the number of non-zero coefficients in frequency domain. Note that our setting does not require any prior knowledge of the given data. Applications of FFT. We outline various applications of Fast Fourier transform, to which partial Fourier transform can potentially be applied. 



FFT has been widely used for anomaly detection (Hou & Zhang (2007); Rasheed et al. (2009); Ren et al. (2019)). Hou & Zhang (2007) and Ren et al. (2019) detect anomalous points of a given data by extracting a compact representation with FFT. Rasheed et al. (2009) use FFT to detect local spatial outliers which have similar patterns within a region but different patterns from the outside. Several works (Pagh (2013); Pham & Pagh (2013); Malik & Becker (2018)) exploit FFT for efficient operations. Pagh (2013) leverages FFT to efficiently compute a polynomial kernel used with support vector machines (SVMs). Malik & Becker (2018) propose an efficient tucker decomposition method using FFT. In addition, FFT has been used for fast training of convolutional neural networks (Mathieu et al. (2014); Rippel et al. (2015)) and an efficient recommendation model on a heterogeneous graph (Jin et al. (2020)).

