LEARNED ISTA WITH ERROR-BASED THRESHOLDING FOR ADAPTIVE SPARSE CODING Anonymous

Abstract

The learned iterative shrinkage thresholding algorithm (LISTA) introduces deep unfolding models with learnable thresholds in the shrinkage function for sparse coding. Drawing on some theoretical insights, we advocate an error-based thresholding (EBT) mechanism for LISTA, which leverages a function of the layer-wise reconstruction error to suggest an appropriate threshold value for each observation on each layer. We show that the EBT mechanism well-disentangles the learnable parameters in the shrinkage functions from the reconstruction errors, making them more adaptive to the various observations. With rigorous theoretical analyses, we show that the proposed EBT can lead to faster convergence on the basis of LISTA and its variants, in addition to its higher adaptivity. Extensive experimental results confirm our theoretical analyses and verify the effectiveness of our methods.

1. INTRODUCTION

Sparse coding is widely used in many machine learning applications (Xu et al., 2012; Dabov et al., 2007; Yang et al., 2010; Ikehata et al., 2012) , and its core problem is to deduce the high-dimensional sparse code from the obtained low-dimensional observation, for example, under the assumption of y = Ax s + ε, where y ∈ R m is the observation corrupted by the inevitable noise ε ∈ R m , x s ∈ R n is the sparse code to be estimated, and A ∈ R m×n is an over-complete dictionary matrix. To recover x s purely from y is called sparse linear inverse problem (SLIP). The main challenge for solving SLIP is its ill-posed nature because of the over-complete modeling, i.e., m < n. A possible solution to SLIP can be obtained via solving a LASSO problem using the l 1 regularization: min x y -Ax 2 + λ x 1 . Possible solutions for Eq. ( 1) are iterative shrinking thresholding algorithm (ISTA) (Daubechies et al., 2004) and its variants, e.g., fast ISTA (FISTA) (Beck & Teboulle, 2009) . Despite their simplicity, these traditional optimization algorithm suffer from slow convergence speed in large scale problems. Therefore, Gregor & LeCun (2010) proposed the learned ISTA (LISTA) which was a deep neural network (DNN) whose architecture followed the iterative process of ISTA. The thresholding mechanism was modified into shrinkage functions in the DNNs together with learnable thresholds. LISTA achieved superior performance in sparse coding, and many theoretical analyses have been proposed to modify LISTA to further improve its performance (Chen et al., 2018; Liu et al., 2019; Zhou et al., 2018; Ablin et al., 2019; Wu et al., 2020) . Yet, LISTA and many other deep networks based on it suffer from two issues. (a) Though the thresholds of the shrinkage functions in LISTA were learnable, their values were shared among all training samples and thus lack adaptability to the variety of training samples and robustness to outliers. According to prior work (Chen et al., 2018; Liu et al., 2019) , the thresholds should be proportional to the upper bound of the norm of the current estimation error to guarantee fast convergence in LISTA. However, outliers with drastically higher estimation errors will affect the thresholds more, making the learned thresholds less suitable to other (training) samples. (b) For the same reason, it may also lead to poor generalization to test data with different distribution (or sparsity (Chen et al., 2018) ) from the training data. For instance, in practice, we may only be given some synthetic sparse codes but not the real ones for training, and current LISTA models may fail to generalize under such circumstances. In this paper, we propose an error-based thresholding (EBT) mechanism to address the aforementioned issues of LISTA-based models to improve their performance. respectively. In addition, the introduced parameters in our EBT well-disentangled from the reconstruction errors and need only to be correlated with the dictionary matrix to ensure convergence. These results guarantee the superiority of our EBT in theory. • We demonstrate the effectiveness of our EBT in the original LISTA and several of its variants in simulation experiments. We also show that it can be applied to practical applications (e.g., photometric stereo analysis) and achieve superior performance as well. The organization of this paper is structured as follows. In Section 2, we will review some preliminary knowledge of our study. In Section 3, we will introduce a basic form of our EBT and several of its improved versions. Section 4 provides a theoretical study of the convergence of EBT-LISTA. Experimental results in Section 5 valid the effectiveness of our method in practice. Section 6 summarizes this paper.

2. BACKGROUND AND PRELIMINARY KNOWLEDGE

As mentioned in Section 1, ISTA is an iterative algorithm for solving LASSO in Eq. ( 1). Its update rule is: x (0) = 0 and x (t+1) = sh λ/γ ((I -A T A/γ)x (t) + A T y/γ), ∀t ≥ 0, where sh b (x) = sign(x)(|x| -b) + is a shrinkage function with a threshold b ≥ 0 and (•) + = max{0, •}, γ is a positive constant scalar greater than or equal to the maximal eigenvalue of the symmetric matrix A T A. LISTA kept the update rule of ISTA but learned parameters via end-to-end training. Its inference process can be formulated as x (0) = 0 and x (t+1) = sh b (t) (W (t) x (t) + U (t) y), t = 0, . . . , d, where Θ = {W (t) , U (t) , b (t) } t=0,...,d is a set of learnable parameters, and, specifically, b (t) is the layer-wise threshold which is learnable but shared among all samples. LISTA achieved lower reconstruction error between its output and the ground-truth x s compared with ISTA, and it is proved to convergence linearly (Chen et al., 2018) with W (t) = I -U (t) A holds for any layer t. Thus, Eq. ( 3) can be written as. x (t+1) = sh b (t) ((I -U (t) A)x (t) + U (t) y), t = 0, . . . , d. Chen et al. ( 2018) further proposed support selection for LISTA, which introduced shp (b (t) ,p) (x) whose elements are defined as (shp (b (t) ,p) (x)) i =    sign(x i )(|x i | -b), if |x i | > b, i / ∈ S p x i , if |x i | > b, i ∈ S p 0, otherwise , to substitute the original shrinking function sh b (t) (x), where S p is the set of the index of the largest p% elements (in absolute value) in vector x. Formally, the update rule of LISTA with support selection is formulated as x (0) = 0 and x (t+1) = shp (b (t) ,p (t) ) ((I -U A)x (t) + U (t) y), t = 0, . . . , d, where p (t) is a hyper-parameter and it increases from early layers to later layers. LISTA with support selection can achieve faster convergence compared with LISTA (Chen et al., 2018).



Drawing on theoretical insights, Under as a conference paper at ICLR 2021 EBT introduces a function of the evolving estimation error to provide each threshold in the shrinkage functions. It has no extra parameter to learn compared with original LISTA-based models yet shows significantly better performance. The main contributions of our paper are listed as follows:• The EBT mechanism can be readily incorporated into popular sparse coding DNNs (e.g.,

