Variational Learning ISTA

Abstract

Compressed sensing combines the power of convex optimization techniques with a sparsity inducing prior on the signal space to solve an underdetermined system of equations. For many problems, the sparsifying dictionary is not directly given, nor its existence can be assumed. Besides, the sensing matrix can change across different scenarios. Addressing these issues requires solving a sparse representation learning problem, namely dictionary learning, taking into account the epistemic uncertainty on the learned dictionaries and, finally, jointly learning sparse representations and reconstructions under varying sensing matrix conditions. We propose a variant of the LISTA architecture that incorporates the sensing matrix into the architecture. In particular, we propose to learn a distribution over dictionaries via a variational approach, dubbed Variational Learning ISTA (VLISTA), which approximates a posterior distribution over the dictionaries as part of an unfolded LISTA-based recovery network. Such a variational posterior distribution is updated after each iteration, and thereby adapts the dictionary according to the optimization dynamics. As a result, VLISTA provides a probabilistic way to jointly learn the dictionary distribution and the reconstruction algorithm with varying sensing matrices. We provide theoretical and experimental support for our architecture and show that it learns calibrated uncertainties.

1. Introduction

Compressed sensing methods aim at solving under-determined inverse problems imposing a prior about signal structure. Sparsity and linear inverse problems were canonical examples of the signal structure and sensing mediums (modelled with a linear transformation Φ). Many works during recent years focused on improving the performance and complexity of compressed sensing solvers for a given dataset. A typical approach is based on unfolding iterative algorithms as layers of neural networks and learning the parameters end-to-end starting from learning iterative soft thresholding algorithm (LISTA) Gregor & LeCun (2010) with many follow-ups works. Varying sensing matrices and unknown sparsifying dictionaries are some of the main challenges of data-driven approaches. The works in Aberdam et al. (2021); Schnoor et al. (2022) address these issues by learning a dictionary and include it in the optimization iteration. However, the data samples might not have any exact sparse representations, which means that there is no ground truth dictionary. The issue can be more severe for heterogeneous datasets where the choice of the dictionary might vary from one sample to another. A principled approach to this problem would be to take a Bayesian approach and define a distribution over the learned dictionaries with proper uncertainty quantification. In this work, first, we formulate an augmented LISTA-like model, termed Augmented Dictionary Learning ISTA (A-DLISTA), that can adapt its parameters to the current data instance. We theoretically motivate such a design and empirically prove that it can outperform other LISTA-like models in a non-static measurement scenario, i.e., considering varying sensing matrices across data samples. We are aware that an augmented version of LISTA, named Neurally Augmented ALISTA (NALISTA), was already proposed in Behrens et al. (2021) , however, there are some fundamental differences between NALISTA and A-DLISTA. First, our model takes as input the per-sample sensing matrix and the dictionary at the current layer. This means that A-DLISTA adapts the parameters to the current measurement setup as well as to the learned dictionaries. In contrast, NALISTA assumes to have a fixed sensing matrix to analytically evaluate its weight matrix, W. Hypothetically, NALISTA could handle varying sensing matrices, however, that comes at the price of having to solve for each data sample the inner optimization step to evaluate the W matrix. Moreover, the architectures of the augmentation networks are profoundly different. Indeed, while NALISTA uses an LSTM, A-DLISTA employ a convolutional neural network, shared across all layers. Such a different choice reflects the different types of dependencies between layers and input data that the networks try to model. We report in subsection 3.3 a detailed discussion about the theoretical motivation and architectural design for A-DLISTA. Moreover, the detailed architecture is described in Appendix A. Finally, we introduce Variational Learning ISTA (VLISTA) where we learn a distribution over dictionaries and update it after each iteration based on the outcome of the previous layer. In this sense, our model learns an adaptive iterative optimization algorithm where the dictionary is iteratively refined for the best performance. Besides, the uncertainties estimation provides an indicator for detecting Out-Of-Distribution (OOD) samples. Intuitively, our model can be understood as a form of a recurrent variational autoencoder, e.g., Chung et al. (2015) , where on each iteration of the optimization algorithm, we have an approximate posterior distribution over the dictionaries, conditioned on the outcome of the last iteration. The main contributions of our work are as follows. • We design an augmented version of LISTA, dubbed A-DLISTA, that can handle non-static measurement setups, i.e., per-sample sensing matrices, and that can adapt parameters to the current data instance. • We propose Variational Learning ISTA (VLISTA) that learns a distribution over sparsifying dictionaries. The model can be interpreted as a Bayesian LISTA model that leverage A-DLISTA as the likelihood model. • VLISTA adapts the dictionary to optimization dynamics and therefore can be interpreted as a hierarchical representation learning approach, where the dictionary atoms gradually permit more refined signal recovery. • The dictionary distributions can be used for out-of-distribution detection. The remaining part of the paper is organized as follows. In section 2 we briefly report related works that are relevant to the current research, while in section 3 the model formulation is detailed. The datasets description, as well as the experimental results, are reported in section 4. Finally, we report our conclusion in section 5. Bayesian Compressed Sensing (BCS) and Dictionary learning. A non-parametric Bayesian approach to dictionary learning has been introduced in Zhou et al. (2009, 2012) , where the authors consider a fully Bayesian joint compressed sensing inversion and dictionary learning. Besides, their atoms are drawn and fixed a priori. Bayesian compressed sensing Ji et al. (2008) leverages relevance vector machines (RVMs) Tipping ( 2001) and uses a hierarchical prior to model distributions of each entry. This line of work quantifies uncertainty of recovered entries while assuming a fixed dictionary. In contrast, in our work, the source of uncertainty is the unknown dictionary over which we define a distribution. 



Preprint. Under review.



LISTA models. Learning ISTA was first introduced in Gregor & LeCun (2010) with many follow-up variations. The follow-up works in Behrens et al. (2021); Liu et al. (2019); Chen et al. (2021); Wu

