FREQUENCY REGULARIZED DEEP CONVOLUTIONAL DICTIONARY LEARNING AND APPLICATION TO BLIND DENOISING

Abstract

Sparse representation via a learned dictionary is a powerful prior for natural images. In recent years, unrolled sparse coding algorithms (e.g. LISTA) have proven to be useful for constructing interpretable deep-learning networks that perform on par with state-of-the-art models on image-restoration tasks. In this study we are concerned with extending the work of such convolutional dictionary learning (CDL) models. We propose to construct strided convolutional dictionaries with a single analytic low-pass filter and a set of learned filters regularized to occupy the complementary frequency space. By doing so, we address the necessary modeling assumptions of natural images with respect to convolutional sparse coding and reduce the mutual coherence and redundancy of the learned filters. We show improved denoising performance at reduced computational complexity when compared to other CDL methods, and competitive results when compared to popular deep-learning models. We further propose to parameterize the thresholds in the soft-thresholding operator of LISTA to be proportional to the estimated noise-variance from an input image. We demonstrate that this parameterization enhances robustness to noise-level mismatch between training and inference.

1. INTRODUCTION

Sparsity in a transform domain is an important and widely applicable property of natural images. This property can be exploited in a variety of tasks such as signal representation, feature extraction, and image processing. For instance, consider restoring an image from a degraded version (noisy, blurry, or missing pixels). These inverse problems are generally ill-posed and require utilizing adequate prior knowledge, for which sparsity has proven extremely effective (Mairal et al., 2014) . In recent years, such problems have been tackled with deep neural network architectures that achieve superior performance but are not well-understood in terms of their building blocks. In this study, we are interested in utilizing the knowledge from classical signal processing and spare coding literature to introduce a learned framework which is interpretable and that can perform on-par with state-ofthe-art deep-learning methods. We choose to explore this method under the task of natural image denoising, in line with much of the recent literature (Sreter & Giryes, 2018; Simon & Elad, 2019; Lecouat et al., 2020) . As a benefit of this interpretability, we are able to extend the framework for a blind-denoising setting using ideas from signal processing. In sparse representation we seek to approximate a signal as a linear combination of a few vectors from a set of vectors (usually called dictionary atoms). Olshausen & Field (1996) , following a neuroscientific perspective, proposed to adapt the dictionary to a set of training data. Later, dictionary learning combined with sparse coding was investigated in numerous applications (Mairal et al., 2009a; Protter & Elad, 2008) . More specifically, for a set of N image patches (reshaped into column vectors) X = [x 1 , • • • , x N ] ∈ R m×N , we seek to find the dictionary D * ∈ R m×k and the sparse representation Z * = [z * 1 , • • • , z * N ] ∈ R k×N such that D * , Z * = arg min D,Z N i=1 z i 0 subject to: Dz i = x i , ∀i = 1, • • • , N. This formulation is not tractable for large signals since minimizing the 0 -pseudo-norm involves a combinatorial optimization (Natarajan, 1995) . To address this complication, a popular technique is to relax the problem by using the 1 -norm as a surrogate (Sreter & Giryes, 2018). When dealing with inverse problems such as denoising, learning the dictionary from the degraded signal has shown effective (Toić & Frossard, 2011) . Let y i = x i + n i ∈ R m represent the noisy signal where n i follows an additive white Gaussian distribution, N 0, σ 2 n I . Then, the relaxed formulation can be written as min D,Z N i=1 z i 1 s.t. N i=1 1 2 Dz i -y i 2 2 ≤ or min D,Z N i=1 1 2 Dz i -y i 2 2 + λ z i 1 (2) where λ is a regularization parameter and is nontrivialy related to the representation error . We will refer to this as the basis-pursuit denoising (BPDN) formulation of dictionary learning. Many iterative algorithms have been proposed in the literature to solve this problem (Mairal et al., 2014) . A majority of these algorithms split the problem into a step updating the dictionary followed by a step solving for the sparse codes. Note that learning a dictionary over independent image patches neglects the dependencies between these patches. As a result, the models involving patch processing are inherently sub-optimal (Batenkov et al (Grosse et al., 2007; Papyan et al., 2017) . Algorithms for solving the CSC model are also discussed in (Moreau & Gramfort, 2019; Wohlberg, 2017) . In this study, we are interested in interpretable CSC-based deep-learning models. A metric known as the mutual-coherence is well known to be related to the representation capability of the dictionary and is of special concern in using the CSC model with natural images (Simon & Elad, 2019) . We take an alternative route to Simon & Elad (2019) in addressing the mutual-coherence of CSC-based deep-learning models, which is both less computationally expensive and improves the denoising performance. We continue the discussion about CSC-based deep-learning models in Sec. 1.1. Another important aspect of the sparse representation is the sparse coding algorithm. For a given signal y ∈ R m and dictionary D, iterative soft-thresholding algorithm (ISTA) (Beck & Teboulle, 2009) finds the solution to the BPDN functional, z * = arg min z 1/2 Dzy 2 2 + λ z 1 , by repeating the following iteration until a convergence criterion is reached: z (k+1) = S λη (k) z k -η (k) D T Dz (k) -y where S θ (x) = sgn(x)(|x| -θ) + , θ ≥ 0. (3) Here, η (k) is the step-size of the descent algorithm at iteration k. Note that performing sparse coding with an iterative method like ISTA for all patches is computationally exhausting and slow. To resolve this issue, Gregor & LeCun (2010) proposed to approximate the sparse coding via a learned differentiable encoder, dubbed LISTA. Further extensions of LISTA both in terms of practice and theory have been studied in the literature (Wu et al., 2019; Chen et al., 2018) . More recently, using LISTA combined with dictionary learning has been a research highlight (Sreter & Giryes, 2018; Simon & Elad, 2019; Lecouat et al., 2020) . We refer to this type of models that leverages LISTA for convolutional dictionary learning as CDL models.

1.1. RELATED WORKS

In this study, we are interested in the CDL model that concatenates a LISTA network with a linear convolutional synthesis dictionary. Let D be a convolutional dictionary with M filters (and their integer shifts). We denote the filters in D by d j where j ∈ {1, • • • , M }. Let Z i denote the sparse code for the data sample y i = x i + n i where i ∈ {1, 2, • • • , N } and n ∼ N (0, σ 2 n I). The corresponding subband signal to d j in Z i can be denoted as z j i . Then the convolutional dictionary learning problem is written as (4)



., 2017; Simon & Elad, 2019). Although enforcing local priors on merged images(Sulam & Elad, 2015)  and utilizing self-similarity between patches(Mairal et al., 2009b)  have been proposed as ideas to mitigate this flaw, ideally a global shift-invariant model is more appropriate. By constraining the dictionary to have a Toeplitz structure, the Convolutional Sparse Coding (CSC) model has been introduced which replaces the local patch processing with a global convolution

