WHERE PRIOR LEARNING CAN AND CAN'T WORK IN UNSUPERVISED INVERSE PROBLEMS

Abstract

Linear inverse problems consist in recovering a signal from its noisy observation in a lower dimensional space. Many popular resolution methods rely on data-driven algorithms that learn a prior from pairs of signals and observations to overcome the loss of information. However, these approaches are difficult, if not impossible, to adapt to unsupervised contexts -where no ground truth data are available -due to the need for learning from clean signals. This paper studies situations that do or do not allow learning a prior in unsupervised inverse problems. First, we focus on dictionary learning and point out that recovering the dictionary is unfeasible without constraints when the signal is observed through only one measurement operator. It can, however, be learned with multiple operators, given that they are diverse enough to span the whole signal space. Then, we study methods where weak priors are made available either through optimization constraints or deep learning architectures. We empirically emphasize that they perform better than hand-crafted priors only if they are adapted to the inverse problem.

1. INTRODUCTION

Linear inverse problems are ubiquitous in observational science such as imaging (Ribes & Schmitt, 2008) , neurosciences (Gramfort et al., 2012) or astrophysics (Starck, 2016) . They consist in reconstructing signals X ∈ R n×N from remote and noisy measurements Y ∈ R m×N which are obtained as a linear transformation A ∈ R m×n of X, corrupted with noise B ∈ R m×N : Y = AX + B. As the dimension m of Y is usually much smaller than the dimension n of X, these problems are ill-posed, and several solutions could lead to a given set of observations. The uncertainty of the measurements, which can be noisy, increases the number of potential solutions. Therefore, practitioners rely on prior knowledge of the data to select a plausible solution among all possible ones. On the one hand, hand-crafted priors relying on sparsity in a basis produce satisfying results on specific data, such as wavelets in imaging or Gaborlets in audio (Mallat, 2008) . However, the complexity and variability of the signals often make ad hoc priors inadequate. On the other hand, the prior can be learned from ground truth data when available. For instance, frameworks based on Plug-and -Play (Brifman et al., 2016) and Deep Learning (Chan et al., 2016; Romano et al., 2017; Rick Chang et al., 2017) propose to integrate a pre-trained denoiser in an iterative algorithm to solve the problem. Supervised methods leveraging sparsity also allow to summarize the structure of the signal (Elad, 2010) . In particular, dictionary learning (Olshausen & Field, 1997; Aharon et al., 2006; Mairal et al., 2009) is efficient on pattern learning tasks such as blood cell detection or MEG signals analysis (Yellin et al., 2017; Dupré la Tour et al., 2018) . Nevertheless, these methods require clean data, sometimes available in audio and imaging but not in fields like neuroimaging or astrophysics. While data-driven methods have been extensively studied in the context of supervised inverse problems, recent works have focused on unsupervised scenarios and provided new algorithms to learn from corrupted data only (Lehtinen et al., 2018; Bora et al., 2018; Liu et al., 2020) . Chen et al. ( 2021) and Tachella et al. (2022) demonstrate that a necessary condition to learn extensive priors from degraded signals is either to measure them with multiple operators which span the whole space, or to introduce weak prior knowledge such as group structures and equivariance in the model when only one operator is available. Other works based on Deep Learning have leveraged successful architectures to recover images without access to any ground truth data. In particular, Deep Image Prior shows that CNNs contain enough prior information to recover an image in several inverse problems, such as denoising or inpainting (Ulyanov et al., 2018) . Finally, a few works have demonstrated that it is possible to learn dictionaries from incomplete data, especially in the context of missing values or inpainting in imaging (Szabó et al., 2011; Studer & Baraniuk, 2012; Naumova & Schnass, 2017) . Another line of work studied online factorization of large matrices by aggregating partial information randomly selected from the data at each iteration (Mensch et al., 2016; 2017) . This is equivalent to learning a dictionary from incomplete data, except that one sample can be looked at multiple times from different angles, which is hardly possible in an inverse problem context.

Contributions

In this paper, we demonstrate practical limitations of prior learning methods for unsupervised inverse problems. We first provide an analysis of dictionary learning when the data is measured with a single or multiple operators. As mentioned by Tachella et al. (2022) , "seeing the whole space" is a necessary condition to learn a good prior from the data, as nothing can be recovered in the kernel of the operator A. However, we point out that this is not sufficient in the case of dictionary learning. Indeed, the problem is made harder by the measurement operators, and is sometimes unfeasible even with access to the whole space. Then we study the practical behavior of methods heavily relying on convolutions in cases where they work well (inpainting) and in cases where they fail because the prior is too weak (deblurring), and provide experiments complementary to the theoretical study of Tachella et al. (2022) . We present three examples, namely Convolutional Dictionary Learning, Deep Image Prior, and Plug and Play, and train the prior "as is" in the range space without relying on any data augmentation technique or equivariance. Finally, we show that the difficulty is deeper than the unsupervised setting by studying what happens in a self-supervised setting when training on ground truth data. In particular, we emphasize that stronger prior information is necessary to link low and high frequencies in deblurring, even in this simpler context.

2. THE MAIN BOTTLENECK OF PRIOR LEARNING IN INVERSE PROBLEMS

For inverse problems, the dimension of the measurements m is often smaller than the dimension of the signal n. This dimension reduction implies that information on the signal contained in the null space of A ∈ R m×n is lost during the observation process, and needs to be reconstructed from the observed signal. We first aim to study the impact of this degradation on constraint-free prior learning through the lens of dictionary learning.

2.1. DICTIONARY LEARNING WITH A SINGLE MEASUREMENT OPERATOR.

Dictionary learning assumes that the signal can be decomposed into a sparse representation in a redundant basis of patterns -also called atoms. In other words, the goal is to recover the signals X ∈ R n×N as DZ where Z ∈ R L×N are sparse codes and D ∈ R n×L is a dictionary. Taking the example of Lasso-based dictionary learning, recovering X would require solving a problem of the form min Z∈R L×N ,D∈C 1 2 ∥ADZ -Y ∥ 2 2 + λ ∥Z∥ 1 , where λ is a regularization hyperparameter and C is a set of constraints, typically set so that columns of D have norm smaller than 1. We first aim to see the impact of A on the algorithm ability to recover a proper dictionary. In Proposition 2.1, we focus on inpainting where the measurement operator is a binary mask or equivalently a diagonal matrix with m non-zeros elements. Proposition 2.1. Then min Let A = diag(λ 1 , • • • , λ n ) ∈ R n×n Z 1 2 ∥AD ′ Z -Y ∥ 2 2 + λ ∥Z∥ 1 ≤ min Z 1 2 ∥AD 0 Z -Y ∥ 2 2 + λ ∥Z∥ 1 . All proofs are deferred to Appendix C. In this simple case, our proposition shows that the optimal dictionary must be 0 in the null space of A. The core idea behind the proof is that due to invariances, the optimal solution for dictionary learning is contained in an equivalence class {P SD ′ + V } where



be a diagonal measurement matrix wherem < n, λ 1 ≥ • • • ≥ λ m > 0 and λ m+1 = • • • = λ n = 0. Let D 0 ∈ R n×L and D ′ be such that

