A SIMPLE SPARSE DENOISING LAYER FOR ROBUST DEEP LEARNING

Abstract

Deep models have achieved great success in many applications. However, vanilla deep models are not well-designed against the input perturbation. In this work, we take an initial step to design a simple robust layer as a lightweight plug-in for vanilla deep models. To achieve this goal, we first propose a fast sparse coding and dictionary learning algorithm for sparse coding problem with an exact k-sparse constraint or l 0 norm regularization. Our method comes with a closedform approximation for the sparse coding phase by taking advantage of a novel structured dictionary. With this handy approximation, we propose a simple sparse denoising layer (SDL) as a lightweight robust plug-in. Extensive experiments on both classification and reinforcement learning tasks manifest the effectiveness of our methods.

1. INTRODUCTION

Deep neural networks have obtained a great success in many applications, including computer vision, reinforcement learning (RL) and natural language processing, etc. However, vanilla deep models are not robust to noise perturbations of the input. Even a small perturbation of input data would dramatically harm the prediction performance (Goodfellow et al., 2015) . To address this issue, there are three mainstreams of strategies: data argumentation based learning methods (Zheng et al., 2016; Ratner et al., 2017; Madry et al., 2018; Cubuk et al., 2020) , loss functions/regularization techniques (Elsayed et al., 2018; Zhang et al., 2019) , and importance weighting of network architecture against noisy input perturbation. Su et al. (2018) empirically investigated 18 deep classification models. Their studies found that model architecture is a more critical factor to robustness than the model size. Most recently, Guo et al. (2020) employed a neural architecture search (NAS) method to investigate the robust architectures. However, the NAS-based methods are still very computationally expensive. Furthermore, their resultant model cannot be easily adopted as a plug-in for other vanilla deep models. A handy robust plug-in for backbone models remains highly demanding. In this work, we take an initial step to design a simple robust layer as a lightweight plug-in for the vanilla deep models. To achieve this goal, we first propose a novel fast sparse coding and dictionary learning algorithm. Our algorithm has a closed-form approximation for the sparse coding phase, which is cheap to compute compared with iterative methods in the literature. The closedform update is handy for the situation that needs fast computation, especially in the deep learning. Based on this, we design a very simple sparse denoising layer for deep models. Our SDL is very flexible, and it enables an end-to-end training. Our SDL can be used as a lightweight plug-in for many modern architecture of deep models (e.g., ResNet and DenseDet for classification and deep PPO models for RL). Our contributions are summarized as follows: • We propose simple sparse coding and dictionary learning algorithms for both k-sparse constrained sparse coding problem and l 0 -norm regularized problem. Our algorithms have simple approximation form for the sparse coding phase. • We introduce a simple sparse denoising layer (SDL) based on our handy update. Our SDL involves simple operations only, which is a fast plug-in layer for end-to-end training. • Extensive experiments on both classification tasks and reinforcement learning tasks show the effectiveness of our SDL.

2. RELATED WORKS

Sparse Coding and Dictionary Learning: Sparse coding and dictionary learning are widely studied in computer vision and image processing. One related popular method is K-SVD (Elad & Aharon, 2006; Rubinstein et al., 2008) , it jointly learns an over-complete dictionary and the sparse representations by minimizing a l 0 -norm regularized reconstruction problem. Specifically, K-SVD alternatively iterates between the sparse coding phase and dictionary updating phase. The both steps are based on heuristic greedy methods. Despite its good performance, K-SVD is very computationally demanding. Moreover, as pointed out by Bao et al. ( 2013), both the sparse coding phase and dictonary updating of K-SVD use some greedy approaches that lack rigorous theoretical guarantee on its optimality and convergence. Bao et al. (2013) proposed to learn an orthogonal dictionary instead of the over-complete one. The idea is to concatenate the free parameters with predefined filters to form an orthogonal dictionary. This trick reduces the time complexity compared with K-SVD. However, their algorithm relies on the predefined filters. Furthermore, the alternative descent method heavily relies on SVD, which is not easy to extend to deep models. In contrast, our method learns a structured over-complete dictionary, which has a simple form as a layer for deep learning. Recently, some works (Venkatakrishnan et al., 2013) employed deep neural networks to approximate alternating direction method of multipliers (ADMM) or other proximal algorithms for image denoising tasks. In (Wei et al., 2020) , reinforcement learning is used to learn the hyperparameters of these deep iterative models. However, this kind of method itself needs to train a complex deep model. Thus, they are computationally expensive, which is too heavy or inflexible as a plug-in layer for backbone models in other tasks instead of image denoising tasks, e.g., reinforcement learning and multi-class classification, etc. An illustration of number of parameters of SDL, DnCNN (Zhang et al., 2017) et al., 2016; Ratner et al., 2017; Cubuk et al., 2020; Elsayed et al., 2018; Zhang et al., 2019) . However, the network architecture remains less explored to address the robustness of the input perturbation. Guo et al. (2020) employed NAS methods to search the robust architectures. However, the searching-based method is very computationally expensive. The resultant architectures cannot be easily used as a plug-in for other popular networks. In contrast, our SDL is based on a closed-form of sparse coding, which can be used as a handy plug-in for many backbone models.

3. FAST SPARSE CODING AND DICTIONARY LEARNING

In this section, we present our fast sparse coding and dictionary learning algorithm for the k-sparse problem and the l 0 -norm regularized problem in Section 3.1 and Section 3.2, respectively. Both algorithms belong to the alternative descent optimization framework.

3.1. K-SPARSE CODING

We first introduce the optimization problem for sparse coding with a k-sparse constraint. Mathematically, we aim at optimizing the following objective min 



subject to y i 0 ≤ k, ∀i ∈ {1, • • • , N } (1) µ(D) ≤ λ d j 2 = 1, ∀j ∈ {1, • • • , M },

and PnP(Wei et al., 2020)  are shown in Table1. SDL has much less parameters and simpler structure compared with DnCNN and PnP, and it can serve as a lightweight plug-in for other tasks, e.g., RL.

