NEURALLY AUGMENTED ALISTA

Abstract

It is well-established that many iterative sparse reconstruction algorithms can be unrolled to yield a learnable neural network for improved empirical performance. A prime example is learned ISTA (LISTA) where weights, step sizes and thresholds are learned from training data. Recently, Analytic LISTA (ALISTA) has been introduced, combining the strong empirical performance of a fully learned approach like LISTA, while retaining theoretical guarantees of classical compressed sensing algorithms and significantly reducing the number of parameters to learn. However, these parameters are trained to work in expectation, often leading to suboptimal reconstruction of individual targets. In this work we therefore introduce Neurally Augmented ALISTA, in which an LSTM network is used to compute step sizes and thresholds individually for each target vector during reconstruction. This adaptive approach is theoretically motivated by revisiting the recovery guarantees of ALISTA. We show that our approach further improves empirical performance in sparse reconstruction, in particular outperforming existing algorithms by an increasing margin as the compression ratio becomes more challenging. 1. We introduce Neurally Augmented ALISTA (NA-ALISTA), an algorithm which learns to adaptively compute thresholds and step-sizes for individual target vectors during recovery. The number of parameters added does not scale with the problem size.

1. INTRODUCTION AND RELATED WORK

Compressed sensing deals with the problem of recovering a sparse vector from very few compressive linear observations, far less than its ambient dimension. Fundamental works of Candes et al. (Candès et al., 2006) and Donoho (Donoho, 2006) show that this can be achieved in a robust and stable manner with computationally tractable algorithms given that the observation matrix fulfills certain conditions, for an overview see Foucart & Rauhut (2017) . Formally, consider the set of s-sparse vectors in R N , i.e. Σ N s := x ∈ R N x 0 ≤ s where the size of the support of x is denoted by x 0 := |supp(x)| = |{i : x i = 0}|. Furthermore, let Φ ∈ R M ×N be the measurement matrix, with typically M N . For a given noiseless observation y = Φx * of an unknown but s-sparse x * ∈ Σ N s we therefore wish to solve: argmin x x 0 s.t. y = Φx In (Candès et al., 2006 ) it has been shown, that under certain assumptions on Φ, the solution to the combinatorial problem in (1) can be also obtained by a convex relaxation where one instead minimizes the 1 -norm of x. The Lagrangian formalism yields then an unconstrained optimization problem also known as LASSO (Tibshirani, 1996) , which penalizes the 1 -norm via the hyperparameter λ ∈ R: x = argmin x 1 2 y -Φx 2 2 + λ x 1 A very popular approach for solving this problem is the iterative shrinkage thresholding algorithm (ISTA) (Daubechies et al., 2003) , in which a reconstruction x (k) is obtained after k iterations from initial x (0) = 0 via the iteration: where η θ is the soft thresholding function given by η θ (x) = sign(x) max(0, |x|θ) (applied coordinate-wise) and L is the Lipschitz constant (i.e. the largest eigenvalue) of Φ T Φ. Famously, the computational graph of ISTA with K iterations can be unrolled to yield Learned ISTA (LISTA) (Gregor & LeCun, 2010), a K-layer neural network in which all parameters involved can be trained (each layer k has an individual threshold parameter and individual or shared matrix weights) using backpropagation and gradient descent. LISTA achieves impressive empirical reconstruction performance for many sparse datasets but loses the theoretical guarantees of ISTA. Bridging the gap between LISTA's strong reconstruction quality and the theoretical guarantees for ISTA, ALISTA (Liu et al., 2019) was introduced. ALISTA, introduces a matrix W T , related to the measurement matrix Φ T in (3), which is computed by optimizing the generalized coherence: x (k+1) = η λ/L x (k) + 1 L Φ T (y -Φx (k) ) µ(W, Φ) = inf W ∈R M ×N max i =j W T :,i Φ :,j s.t. ∀i ∈ {1, . . . , N } : W T :,i Φ :,i = 1 (4) Then, contrary to LISTA, all matrices are excluded from learning in order to retain desirable properties such as low coherence. For each layer of ALISTA, only a scalar step size parameter γ (k) and a scalar threshold θ (k) is learned from the data, yielding the iteration: x (k+1) = η θ (k) x (k) -γ (k) W T (Φx (k) -y) As in LISTA, the parameters for ALISTA are learned end-to-end using backpropagation and stochastic gradient descent by empirically minimizing the reconstruction error: min θ (1) ,...,θ (K) ,γ (1) ,...,γ (K) E x * x (K) -x * 2 2 (6) The authors rigorously upper-bound the reconstruction error of ALISTA in the noiseless case and demonstrate strong empirical reconstruction quality even in the noisy case. The empirical performance similar to LISTA, the retained theoretical guarantees, and the reduction of number of parameters to train from either O(KM 2 + N M ) in vanilla LISTA or O(M N K) in the variant of LISTA-CPSS (Chen et al., 2018) to just O(K), make ALISTA an appealing algorithm to study and extend. In (Ablin et al., 2019) , instead of directly focusing on the reconstruction problem, where λ is not known a priori, analytical conditions for optimal step sizes in ISTA are derived for LASSO, yielding Stepsize-ISTA. Stepsize-ISTA is a variant of LISTA in which the measurement matrices are exempt from training like in ALISTA, outperforming existing approaches to directly solving LASSO. Thresholds that are adaptive to the current target vector have been explored in ALISTA-AT (Kim & Park, 2020) . Following the majorization-minimization method, component-wise thresholds are computed from previous iterations. In a particular case this yields θ (k) i = 1/(1 + |x (k-1) i |/ ) for some > 0, known as iterative reweighted 1 -minimization. By unrolling this algorithm, the authors demonstrate superior recovery over ALISTA for a specific setting of M, N and s. In a related approach (Wu et al., 2020) identify undershooting, meaning that reconstructed components are smaller than target components, as a shortcoming of LISTA and propose Gated-LISTA to address these issues. The authors introduce gain and overshoot gates to LISTA, which can amplify the reconstruction after each iteration before and after thresholding, yielding an architecture resembling GRU cells (Cho et al., 2014) . The authors demonstrate better sparse reconstruction than previous LISTA-variants and also show that adding their proposed gates to ALISTA, named AGLISTA, it is possible to improve its performance in the same setting of M, N and s as ALISTA-AT. In this paper, motivated by essential proof steps of ALISTA's recovery guarantee, we propose an alternative method for adaptively choosing thresholds and step sizes during reconstruction. Our method directly extends ALISTA by using a recurrent neural network to predict thresholds and step sizes depending on an estimate of the 1 -error between the reconstruction and the unknown target vector after each iteration. We refer to our method as Neurally Augmented ALISTA (NA-ALISTA), as the method falls into the general framework of neural augmentation of unrolled algorithms (Welling, 2020; Monga et al., 2019; Diamond et al., 2017) . The rest of the paper is structured as follows: we provide theoretical motivation for NA-ALISTA in Section 2, before describing our method in detail in Section 3. In Section 4, we demonstrate experimentally that NA-ALISTA achieves state-of-the-art performance in all evaluated settings. To summarize, our main contributions are:



3) * equal contribution † The work is partially funded by DFG grant JU 2795/3 and the German Federal Ministry of Education and Research (BMBF) in the framework of the international future AI lab "AI4EO -Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond" (Grant number: 01DD20001).

