ZERO-SHOT IMAGE RESTORATION USING DENOISING DIFFUSION NULL-SPACE MODEL

Abstract

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators. In this work, we propose the Denoising Diffusion Null-Space Model (DDNM), a novel zero-shot framework for arbitrary linear IR problems, including but not limited to image super-resolution, colorization, inpainting, compressed sensing, and deblurring. DDNM only needs a pre-trained off-the-shelf diffusion model as the generative prior, without any extra training or network modifications. By refining only the null-space contents during the reverse diffusion process, we can yield diverse results satisfying both data consistency and realness. We further propose an enhanced and robust version, dubbed DDNM + , to support noisy restoration and improve restoration quality for hard tasks. Our experiments on several IR tasks reveal that DDNM outperforms other state-of-the-art zero-shot IR methods. We also demonstrate that DDNM + can solve complex real-world applications, e.g., old photo restoration.

1. INTRODUCTION

Image Restoration (IR) is a long-standing problem due to its extensive application value and its illposed nature (Richardson, 1972; Andrews & Hunt, 1977) . IR aims at yielding a high-quality image x from a degraded observation y = Ax + n, where x stands for the original image and n represents a non-linear noise. A is a known linear operator, which may be a bicubic downsampler in image super-resolution, a sampling matrix in compressed sensing, or even a composite type. Traditional IR methods are typically model-based, whose solution can be usually formulated as: x = arg min x 1 2σ 2 ||Ax -y|| 2 2 + λR(x). (1) The first data-fidelity term 1 2σ 2 ||Ax-y|| 2 2 optimizes the result toward data consistency while the second image-prior term λR(x) regularizes the result with formulaic prior knowledge on natural image distribution, e.g., sparsity and Tikhonov regularization. Though the hand-designed prior knowledge may prevent some artifacts, they often fail to bring realistic details. The prevailing of deep neural networks (DNN) brings new patterns of solving IR tasks (Dong et al., 2015) , which typically train an end-to-end DNN D θ by optimizing network parameters θ following arg min θ N i=1 ||D θ (y i ) -x i || 2 2 , where N pairs of degraded image y i and ground truth image x i are needed to learn the mapping from y to x directly. Although end-to-end learning-based IR methods avoid explicitly modeling the degradation A and the prior term in Eq. 1 and are fast during inference, they usually lack interpretation. Some efforts have been made in exploring interpretable DNN structures (Zhang & Ghanem, 2018; Zhang et al., 2020) , however, they still yield poor performance when facing domain shift since Eq. 2 essentially encourage learning the mapping from y i to x i . For the same reason, the end-to-end learning-based IR methods usually need to train a dedicated DNN for each specific task, lacking generalizability and flexibility in solving diverse IR tasks. The evolution of generative models (Goodfellow et al., 2014; Bahat & Michaeli, 2014; Van Den Oord et al., 2017; Karras et al., 2019; 2020; 2021) further pushes the end-to-end learning-based IR methods toward unprecedented performance in yielding realistic results (Yang et al., 2021; Wang et al., 2021; Chan et al., 2021; Wang et al., 2022) . At the same time, some methods (Menon et al., 2020; Pan et al., 2021) start to leverage the latent space of pretrained generative models to solve IR problems in a zero-shot way. Typically, they optimize the following objective: arg min w 1 2σ 2 ||AG(w) -y|| 2 2 + λR(w), where G is the pretrained generative model, w is the latent code, G(w) is the corresponding generative result and R(w) constrains w to its original distribution space, e.g., a Gaussian distribution. However, this type of method often struggles to balance realness and data consistency. The Range-Null space decomposition (Schwab et al., 2019; Wang et al., 2023) offers a new perspective on the relationship between realness and data consistency: the data consistency is only related to the range-space contents, which can be analytically calculated. Hence the data term can be strictly guaranteed, and the key problem is to find proper null-space contents that make the result satisfying realness. We notice that the emerging diffusion models (Ho et al., 2020; Dhariwal & Nichol, 2021) are ideal tools to yield ideal null-space contents because they support explicit control over the generation process. In this paper, we propose a novel zero-shot solution for various IR tasks, which we call the Denoising Diffusion Null-Space Model (DDNM). By refining only the null-space contents during the reverse diffusion sampling, our solution only requires an off-the-shelf diffusion model to yield realistic and data-consistent results, without any extra training or optimization nor needing any modifications to network structures. Extensive experiments show that DDNM outperforms state-of-the-art zeroshot IR methods in diverse IR tasks, including super-resolution, colorization, compressed sensing, inpainting, and deblurring. We further propose an enhanced version, DDNM + , which significantly elevates the generative quality and supports solving noisy IR tasks. Our methods are free from domain shifts in degradation modes and thus can flexibly solve complex IR tasks with real-world degradation, such as old photo restoration. Our approaches reveal a promising new path toward solving IR tasks in zero-shots, as the data consistency is analytically guaranteed, and the realness is determined by the pretrained diffusion models used, which are rapidly evolving. Fig. 1 provides some typical applications that fully show the superiority and generality of the proposed methods.

Contributions. (1)

In theory, we reveal that a pretrained diffusion model can be a zero-shot solver for linear IR problems by refining only the null-space during the reverse diffusion process. Correspondingly, we propose a unified theoretical framework for arbitrary linear IR problems. We further extend our method to support solving noisy IR tasks and propose a time-travel trick to improve the restoration quality significantly; (2) In practice, our solution is the first that can decently solve diverse linear IR tasks with arbitrary noise levels, in a zero-shot manner. Furthermore, our solution can handle composite degradation and is robust to noise types, whereby we can tackle challenging real-world applications. Our proposed DDNMs achieve state-of-the-art zero-shot IR results.

2.1. REVIEW THE DIFFUSION MODELS

We follow the diffusion model defined in denoising diffusion probabilistic models (DDPM) (Ho et al., 2020) . DDPM defines a T -step forward process and a T -step reverse process. The forward process slowly adds random noise to data, while the reverse process constructs desired data samples from the noise. The forward process yields the present state x t from the previous state x t-1 : q(x t |x t-1 ) = N (x t ; 1 -β t x t-1 , β t I) i.e., x t = 1 -β t x t-1 + β t ϵ, ϵ ∼ N (0, I), ) where x t is the noised image at time-step t, β t is the predefined scale factor, and N represents the Gaussian distribution. Using reparameterization trick, it becomes q(x t |x 0 ) = N (x t ; √ ᾱt x 0 , (1 -ᾱt )I) with α t = 1 -β t , ᾱt = t i=0 α i . The reverse process aims at yielding the previous state x t-1 from x t using the posterior distribution p(x t-1 |x t , x 0 ), which can be derived from the Bayes theorem using Eq. 4 and Eq. 5: p(x t-1 |x t , x 0 ) = q(x t |x t-1 ) q(x t-1 |x 0 ) q(x t |x 0 ) = N (x t-1 ; µ t (x t , x 0 ), σ 2 t I), with the closed forms of mean µ t (x t , x 0 )= 1 √ αt x t -ϵ 1-αt √ 1-ᾱt and variance σ 2 t = 1-ᾱt-1 1-ᾱt β t . ϵ represents the noise in x t and is the only uncertain variable during the reverse process. DDPM uses a neural network Z θ to predict the noise ϵ for each time-step t, i.e., ϵ t = Z θ (x t , t), where ϵ t denotes the estimation of ϵ at time-step t. To train Z θ , DDPM randomly picks a clean image x 0 from the dataset and samples a noise ϵ ∼ N (0, I), then picks a random time-step t and updates the network parameters θ in Z θ with the following gradient descent step (Ho et al., 2020) : ∇ θ ||ϵ -Z θ ( √ ᾱt x 0 + ϵ √ 1 -ᾱt , t)|| 2 2 . (7) By iteratively sampling x t-1 from p(x t-1 |x t , x 0 ), DDPM can yield clean images x 0 ∼q(x) from random noises x T ∼N (0, I), where q(x) represents the image distribution in the training dataset.

2.2. RANGE-NULL SPACE DECOMPOSITION

For ease of derivation, we represent linear operators in matrix form and images in vector form. Note that our derivations hold for all linear operators. Given a linear operator A ∈ R d×D , its pseudoinverse A † ∈ R D×d satisfies AA † A ≡ A. There are many ways to solve the pseudo-inverse A † , e.g., the Singular Value Decomposition (SVD) is often used to solve A † in matrix form, and the Fourier transform is often used to solve the convolutional form of A † . A and A † have some interesting properties. A † A can be seen as the operator that projects samples x ∈ R D×1 to the range-space of A because AA † Ax ≡ Ax. In contrast, (I -A † A) can be seen as the operator that projects samples x to the null-space of A because A(I -A † A)x ≡ 0. Interestingly, any sample x can be decomposed into two parts: one part is in the range-space of A and the other is in the null-space of A, i.e., x ≡ A † Ax + (I -A † A)x. (8) This decomposition has profound significance for linear IR problems, which we will get to later.

3.1. DENOISING DIFFUSION NULL-SPACE MODEL

Null-Space Is All We Need. We start with noise-free Image Restoration (IR) as below: y = Ax, where x ∈ R D×1 , A ∈ R d×D , and y ∈ R d×1 denote the ground-truth (GT) image, the linear degradation operator, and the degraded image, respectively. Given an input y, IR problems essentially aim to yield an image x ∈ R D×1 that conforms to the following two constraints: Consistency : Ax ≡ y, Realness : x ∼ q(x), where q(x) denotes the distribution of the GT images. For the Consistency constraint, we can resort to range-null space decomposition. As discussed in Sec. 2.2, the GT image x can be decomposed as a range-space part A † Ax and a null-space part (I -A † A)x. Interestingly, we can find that the range-space part A † Ax becomes exactly y after being operated by A, while the null-space part (I -A † A)x becomes exactly 0 after being operated by A, i.e., Ax ≡ AA † Ax + A(I -A † A)x ≡ Ax + 0 ≡ y. More interestingly, for a degraded image y, we can directly construct a general solution x that satisfies the Consistency constraint Ax ≡ y, that is x = A † y + (I -A † A)x. Whatever x is, it does not affect the Consistency at all. But x determines whether x ∼ q(x). Then our goal is to find a proper x that makes x ∼ q(x). We resort to diffusion models to generate the null-space (I -A † A)x which is in harmony with the range-space A † y. Refine Null-Space Iteratively. We know the reverse diffusion process iteratively samples x t-1 from p(x t-1 |x t , x 0 ) to yield clean images x 0 ∼ q(x) from random noises x T ∼ N (0, I). However, this process is completely random, and the intermediate state x t is noisy. To yield clean intermediate states for range-null space decomposition, we reparameterize the mean µ t (x t , x 0 ) and variance σ 2 t of distribution p(x t-1 |x t , x 0 ) as: µ t (x t , x 0 ) = √ ᾱt-1 β t 1 -ᾱt x 0 + √ α t (1 -ᾱt-1 ) 1 -ᾱt x t , σ 2 t = 1 -ᾱt-1 1 -ᾱt β t , where x 0 is unknown, but we can reverse Eq. 5 to estimate a x 0 from x t and the predicted noise ϵ t = Z θ (x t , t). We denote the estimated x 0 at time-step t as x 0|t , which can be formulated as: x 0|t = 1 √ ᾱt x t -Z θ (x t , t) √ 1 -ᾱt . Note that this formulation is equivalent to the original DDPM. We do this because it provides a "clean" image x 0|t (rather than noisy image x t ). To finally yield a x 0 satisfying Ax 0 ≡ y, we fix the range-space as A † y and leave the null-space unchanged, yielding a rectified estimation x0|t as: x0|t = A † y + (I -A † A)x 0|t . ( ) Hence we use x0|t as the estimation of x 0 in Eq. 11, thereby allowing only the null space to participate in the reverse diffusion process. Then we yield x t-1 by sampling from p(x t-1 |x t , x0|t ): x t-1 = √ ᾱt-1 β t 1 -ᾱt x0|t + √ α t (1 -ᾱt-1 ) 1 -ᾱt x t + σ t ϵ, ϵ ∼ N (0, I). ( ) Roughly speaking, x t-1 is a noised version of x0|t and the added noise erases the disharmony between the range-space contents A † y and the null-space contents (I -A † A)x 0|t . Therefore, iteratively applying Eq. 12, Eq. 13, and Eq. 14 yields a final result x 0 ∼q(x). Note that all the rectified estimation x0|t conforms to Consistency due to the fact that Ax 0|t ≡ AA † y + A(I -A † A)x 0|t ≡ AA † Ax + 0 ≡ Ax ≡ y. ( ) Considering x 0 is equal to x0|1 , so the final result x 0 also satisfies Consistency. We call the proposed method the Denoising Diffusion Null-Space Model (DDNM) because it utilizes the denoising diffusion model to fill up the null-space information. Algorithm 1 Sampling of DDNM 1: x T ∼ N (0, I) 2: for t = T, ..., 1 do 3: x 0|t = 1 √ ᾱt xt -Z θ (xt, t) √ 1 -ᾱt 4: x0|t = A † y + (I -A † A)x 0|t 5: xt-1 ∼ p(xt-1|xt, x0|t ) 6: return x0 Algorithm 2 Sampling of DDNM + 1: x T ∼ N (0, I) 2: for t = T, ..., 1 do 3: L = min{T -t, l} 4: x t+L ∼ q(x t+L |xt)

5:

for j = L, ..., 0 do 6: It is worth noting that our method is compatible with most of the recent advances in diffusion models, e.g., DDNM can be deployed to score-based models (Song & Ermon, 2019; Song et al., 2020) or combined with DDIM (Song et al., 2021a) to accelerate the sampling speed. x 0|t+j = 1 √ ᾱt+j xt+j -Z θ (xt+j , t + j) 1 -ᾱt+j 7: x0|t+j = x 0|t+j -Σt+j A † (Ax 0|t+j -y) 8: xt+j-1 ∼ p(xt+j-1|xt+j , x0|t+j ) 9: return x0 A † y x T ... x 0 xt x 0|t μ t σ t ϵ ⊕ x t-1 (I -A † A)x 0|t A † Ax 0|t x 0|t ⊕ ^p ...

3.2. EXAMPLES OF CONSTRUCTING A AND A †

Typical IR tasks usually have simple forms of A and A † , some of which are easy to construct by hand without resorting to complex Fourier transform or SVD. Here we introduce three practical examples. Inpainting is the simplest case, where A is the mask operator. Due to the unique property that AAA ≡ A, we can use A itself as A † . For colorization, A can be a pixel-wise operator 1 3 1 3 1 3 that converts each RGB channel pixel [r g b] ⊤ into a grayscale value r 3 + g 3 + b 3 . It is easy to construct a pseudo-inverse A † = [1 1 1] ⊤ that satisfies AA † ≡ I. The same idea can be used for SR with scale n, where we can set A ∈ R 1×n 2 as the average-pooling operator 1 n 2 ... 1 n 2 that averages each patch into a single value. Similarly, we can construct its pseudoinverse as A † ∈ R n 2 ×1 = [1 ... 1] ⊤ . We provide pytorch-like codes in Appendix E. Considering A as a compound operation that consists of many sub-operations, i.e., A = A 1 ...A n , we may still yield its pseudo-inverse A † = A † n ...A † 1 . This provides a flexible solution for solving complex IR tasks, such as old photo restoration. Specifically, we can decompose the degradation of old photos as three parts, i.e., A = A 1 A 2 A 3 , where A 3 is the grayscale operator, A 2 is the average-pooling operator with scale 4, and A 1 is the mask operator defined by the damaged areas on the photo. Hence the pseudo-inverse is A † = A † 3 A † 2 A † 1 . Our experiments show that these hand-designed operators work very well (Fig. 1(a, b, d )).

3.3. ENHANCED VERSION: DDNM +

DDNM can solve noise-free IR tasks well but fails to handle noisy IR tasks and yields poor Realness in the face of some particular forms of A † . To overcome these two limits, as described by Algo. 2, we propose an enhanced version, dubbed DDNM + , by making the following two major extensions to DDNM to enable it to handle noisy situations and improve its restoration quality. Scaling Range-Space Correction to Support Noisy Image Restoration We consider noisy IR problems in the form of y=Ax + n, where n∈R d×1 ∼N (0, σ 2 y I) represents the additive Gaussian noise and Ax represents the clean measurement. Applying DDNM directly yields x0|t = A † y + (I -A † A)x 0|t = x 0|t -A † (Ax 0|t -Ax) + A † n, where A † n ∈ R D×1 is the extra noise introduced into x0|t and will be further introduced into x t-1 . A † (Ax 0|t -Ax) is the correction for the range-space contents, which is the key to Consistency. To solve noisy image restoration, we propose to modify DDNM (on Eq. 13 and Eq. 14) as: x0|t = x 0|t -Σ t A † (Ax 0|t -y), p(x t-1 |x t , x0|t ) = N (x t-1 ; µ t (x t , x0|t ), Φ t I). ( ) Σ t ∈ R D×D is utilized to scale the range-space correction A † (Ax 0|t -y) and Φ t ∈ R D×D is used to scale the added noise σ t ϵ in p(x t-1 |x t , x0|t ). The choice of Σ t and Φ t follows two principles: (i) Σ t and Φ t need to assure the total noise variance in x t-1 conforms to the definition in q(x t-1 |x 0 ) (Eq. 5) so the total noise can be predicted by Z θ and gets removed; (ii) Σ t should be as close as possible to I to maximize the preservation of the range-space correction A † (Ax 0|t -y) so as to maximize the Consistency. For SR and colorization defined in Sec.3.2, A † is copy operation. Thus A † n can be approximated as a Gaussian noise N (0, σ 2 y I), then Σ t and Φ t can be simplified as Σ t = λ t I and Φ t = γ t I. Since x t-1 = √ ᾱt-1βt 1-ᾱt x0|t + √ αt(1-ᾱt-1) 1-ᾱt x t + σ t ϵ, principle (i) is equivalent to: (a t λ t σ y ) 2 + γ t ≡ σ 2 t with a t denotes √ ᾱt-1βt 1-ᾱt . Considering principle (ii), we set: γ t = σ 2 t -(a t λ t σ y ) 2 , λ t = 1, σ t ≥ a t σ y σ t /a t σ y , σ t < a t σ y . ( ) In addition to the simplified version above, we also provide a more accurate version for general forms of A † , where we set Σ t = Vdiag{λ t1 , . . . , λ tD }V ⊤ , Φ t = Vdiag{γ t1 , . . . , γ tD }V ⊤ . V is derived from the SVD of the operator A(= UΣV ⊤ ). The calculation of λ ti and γ ti are presented in Appendix I. Note that the only hyperparameter that need manual setting is σ y . We can also approximate non-Gaussian noise like Poisson, speckle, and real-world noise as Gaussian noise, thereby estimating a noise level σ y and resorting to the same solution mentioned above. Time-Travel For Better Restoration Quality We find that DDNM yields inferior Realness when facing particular cases like SR with large-scale average-pooling downsampler, low sampling ratio compressed sensing(CS), and inpainting with a large mask. In these cases, the range-space contents A † y is too local to guide the reverse diffusion process toward yielding a global harmony result. Let us review Eq. 11. We can see that the mean value µ t (x t , x 0 ) of the posterior distribution p(x t-1 |x t , x 0 ) relies on accurate estimation of x 0 . DDNM uses x0|t as the estimation of x 0 at timestep t, but if the range-space contents A † y is too local or uneven, x0|t may have disharmonious null-space contents. How can we salvage the disharmony? Well, we can time travel back to change the past. Say we travel back to time-step t + l, we can yield the next state x t+l-1 using the "future" estimation x0|t , which should be more accurate than x0|t+l . By reparameterization, this operation is equivalent to sampling x t+l-1 from q(x t+l-1 |x t-1 ). Similar to Lugmayr et al. (2022) that use a "back and forward" strategy for inpainting tasks, we propose a time-travel trick to improve global harmony for general IR tasks: For a chosen time-step t, we sample x t+l from q(x t+l |x t ). Then we travel back to time-step t + l and repeat normal DDNM sampling (Eq. 12, Eq. 13, and Eq. 14) until yielding x t-1 . l is actually the travel length. Intuitively, the time-travel trick produces a better "past", which in turn produces a better "future". For ease of use, we assign two extra hyperparameters: s controls the interval of using the time-travel trick; r determines the repeat times. The time-travel trick in Algo. 2 is with s = 1, r = 1. It is worth emphasizing that although Algo. 1 and Algo. 2 are derived based on DDPM, they can also be easily extended to other diffusion frameworks, such as DDIM (Song et al., 2021a) . Obviously, DDNM + becomes exactly DDNM when setting Σ t = I, Φ t = σ 2 t I, and l = 0.

4. EXPERIMENTS

Our experiments consist of three parts. Firstly, we evaluate the performance of DDNM on five typical IR tasks and compare it with state-of-the-art zero-shot IR methods. Secondly, we experiment DDNM + on three typical IR tasks to verify its improvements against DDNM. Thirdly, we show that DDNM and DDNM + perform well on challenging real-world applications. Table 2 : Ablation study on denoising improvements (left) and the time-travel trick (right). C represents the colorization task. σ denotes the noise variance on y.

4.1. EVALUATION ON DDNM

To evaluate the performance of DDNM, we compare DDNM with recent state-of-the-art zero-shot IR methods: DGP(Chen & Davies, 2020), Pulse (Menon et al., 2020) , ILVR (Choi et al., 2021) , RePaint (Lugmayr et al., 2022) and DDRM (Kawar et al., 2022) . We experiment on five typical noise-free IR tasks, including 4× SR with bicubic downsampler, deblurring with Gaussian blur kernel, colorization with average grayscale operator, compressed sensing (CS) using Walsh-Hadamard sampling matrix with a 0.25 compression ratio, and inpainting with text masks. For each task, we use the same degradation operator for all methods. We choose ImageNet 1K and CelebA-HQ 1K datasets with image size 256×256 for validation. For ImageNet 1K, we use the 256×256 denoising network as Z θ , which is pretrained on ImageNet by Dhariwal & Nichol (2021) . For CelebA-HQ 1K, we use the 256×256 denoising network pretrained on CelebA-HQ by Lugmayr et al. (2022) . For fair comparisons, we use the same pretrained denoising networks for ILVR, RePaint, DDRM, and DDNM. We use DDIM as the base sampling strategy with η = 0.85, 100 steps, without classifier guidance, for all diffusion-based methods. We choose PSNR, SSIM, and FID (Heusel et al., 2017) as the main metrics. Since PSNR and SSIM can not reflect the colorization performance, we use FID and the Consistency metric (calculated by ||Ax 0 -y|| 1 and denoted as Cons) for colorization. Tab. 1 shows the quantitative results. For those tasks that are not supported, we mark them as "NA". We can see that DDNM far exceeds previous GAN prior based zero-shot IR methods (DGP, PULSE). Though with the same pretrained denoising models and sampling steps, DDNM achieves significantly better performance in both Consistency and Realness than ILVR, RePaint, and DDRM. Appendix J shows more quantitative comparisons and qualitative results.

4.2. EVALUATION ON DDNM +

We evaluate the performance of DDNM + from two aspects: the denoising performance and the robustness in restoration quality. Denoising Performance. We experiment DDNM + on three noisy IR tasks with l = 0, i.e., we disable the time-travel trick to only evaluate the denoising performance. Fig. 4 (a) and the left part in Tab. 2 show the denoising improvements of DDNM + against DDNM. We can see that DDNM fully inherits the noise contained in y, while DDNM + decently removes the noise. Robustness in Restoration Quality. We experiment DDNM + on three tasks that DDNM may yield inferior results, they are 32× SR, colorization, and compressed sensing (CS) using orthogonalized sampling matrix with a 10% compression ratio. For fair comparison, we set T = 250, l = s = 20, r = 3 for DDNM + while set T = 1000 for DDNM so that the total sampling steps and computational consumptions are roughly equal. Fig. 4 (b) and the right part in Tab. 2 show the improvements of the time-travel trick. We can see that the time-travel trick significantly improves the overall performance, especially the Realness (measured by FID). To the best of our knowledge, DDNM + is the first IR method that can robustly handle arbitrary scales of linear IR tasks. As is shown in Fig. 3 , We compare DDNM + (l = s = 10, r = 5) with state-of-the-art zero-shot IR methods on diverse IR tasks. We also crop images from DIV2K dataset (Agustsson & Timofte, 2017) as the testset. The results show that DDNM + owns excellent robustness in dealing with diverse IR tasks, which is remarkable considering DDNM + as a zero-shot method. More experiments of DDNM/DDNM + can be found in Appendix A and B.

4.3. REAL-WORLD APPLICATIONS

Theoretically, we can use DDNM + to solve real-world IR task as long as we can construct an approximate linear degradation A and its pseudo-inverse A † . Here we demonstrate two typical real-world applications using DDNM + with l = s = 20, r = 3: (1) Real-World Noise. We experiment DDNM + on real-world colorization with A and A † defined in Sec. 5 RELATED WORK

5.1. DIFFUSION MODELS FOR IMAGE RESTORATION

Recent methods using diffusion models to solve image restoration can be roughly divided into two categories: supervised methods and zero-shot methods. Supervised Methods. SR3 (Saharia et al., 2021) trains a conditional diffusion model for image super-resolution with synthetic image pairs as the training data. This pattern is further promoted to other IR tasks (Saharia et al., 2022) . To solve image deblurring, Whang et al. ( 2022) uses a deterministic predictor to estimate the initial result and trains a diffusion model to predict the residual. However, these methods all need task-specific training and can not generalize to different degradation operators or different IR tasks. Zero-Shot Methods. Song & Ermon (2019) first propose a zero-shot image inpainting solution by guiding the reverse diffusion process with the unmasked region. They further propose using gradient guidance to solve general inverse problems in a zero-shot fashion and apply this idea to medical imaging problems (Song et al., 2020; 2021b) . ILVR (Choi et al., 2021) applies low-frequency guidance from a reference image to achieve reference-based image generation tasks. RePaint (Lugmayr et al., 2022) solves the inpainting problem by guiding the diffusion process with the unmasked region. DDRM (Kawar et al., 2022) uses SVD to decompose the degradation operators. However, SVD encounters a computational bottleneck when dealing with high-dimensional matrices. Actually, the core guidance function in ILVR (Choi et al., 2021) , RePaint (Lugmayr et al., 2022) and DDRM (Kawar et al., 2022) can be seen as special cases of the range-null space decomposition used in DDNM, detailed analysis is in Appendix H. 

6. CONCLUSION & DISCUSSION

This paper presents a unified framework for solving linear IR tasks in a zero-shot manner. We believe that our work demonstrates a promising new path for solving general IR tasks, which may also be instructive for general inverse problems. Theoretically, our framework can be easily extended to solve inverse problems of diverse data types, e.g., video, audio, and point cloud, as long as one can collect enough data to train a corresponding diffusion model. More discussions are in Appendix C. 

A TIME & MEMORY CONSUMPTION

Our method has obvious advantages in time & memory consumption among recent zero-shot diffusion-based restoration methods (Kawar et al., 2022; Ho et al., 2022; Chung et al., 2022b; a) . These methods are all based on basic diffusion models, the differences are how to bring the constraint y = Ax + n into the reverse diffusion process. We conclude our advantages as below: • DDNM yields almost the same consumption as the original diffusion models. • DDNM does not need any optimization toward minimizing ||y -Ax 0|t || since we directly yield the optimal solution by range-null space decomposition (Section 3.1) and precise range-space denoising (Section 3.3). We notice some recent works (Ho et al., 2022; Chung et al., 2022b; a) resort to such optimization, e.g., DPS (Chung et al., 2022a ) uses x t-1 = x t-1 -ζ t ∇ xt ||y -Ax 0|t || 2 2 to update x t-1 ; however, this involves costly gradient computation. • Unlike DDRM (Kawar et al., 2022) 

B COMPARING DDNM WITH SUPERVISED METHODS

Our method is superior to existing supervised IR methods (Zhang et al., 2021; Liang et al., 2021) in these ways: • DDNM is zero-shot for diverse tasks, but supervised methods need to train separate models for each task. • DDNM is robust to degradation modes, but supervised methods own poor generalized performance. • DDNM yields significantly better performance on certain datasets and resolutions (e.g., ImageNet at 256x256). 

C LIMITATIONS

There remain many limitations that deserve further study. • Though DDNM brings negligible extra cost on computations, it is still limited by the slow inference speed of existing diffusion models. • DDNM needs explicit forms of the degradation operator, which may be challenging to acquire for some tasks. Approximations may work well, but not optimal. • In theory, DDNM only supports linear operators. Though nonlinear operators may also have "pseudo-inverse", they may not conform to the distributive property, e.g., sin(a+b) ̸ = sin(a) + sin(b), so they may not have linearly separable null-space and range-space. • DDNM inherits the randomness of diffusion models. This property benefits diversity but may yield undesirable results sometimes. • The restoration capabilities of DDNM are limited by the performance of the pretrained denoiser, which is related to the network capacity and the training dataset. For example, existing diffusion models do not outperform StyleGANs (Karras et al., 2019; 2020; 2021) in synthesizing FFHQ/AFHQ images at 1024×1024 resolution.

D SOLVING REAL-WORLD DEGRADATION USING DDNM +

DDNM+ can well handle real-world degradation, where the degradation operator A is unknown and non-linear and even contains non-Gaussian noise. We follow these observations: • In theory, DDNM + is designed to solve IR tasks of diverse noise levels. As is shown in Fig. 5 , DDNM + can well handle 4× SR even with a strong noise σ y =0.9. • For real-world degraded images, the non-linear artifacts can generally be divided into global (e.g., the real-world noise in Fig. 1(c )) and local (e.g., the scratches in Fig. 1(d) ). • For global non-linear artifacts, we can set a proper σ y to cover them. As is shown in Fig. 6 , the input images y suffer JPEG-like unknown artifacts, but DDNM + can still remove them decently by setting a proper σ y . • For local non-linear artifacts, we can directly draw a mask to cover them. Hence all we need is to construct A = A color A mask and set a proper σ y . We have proved A color and A mask and their pseudo-inverse can be easily constructed by hand. (maybe a A SR is needed for resize when y is too blur) In Fig. 7 we demonstrate an example. The input image y is a black-and-white photo with unknown noise and scratches. We first manually draw a mask A mask to cover these scratches. Then we use a grayscale operator A color to convert the image into grayscale. Definition of A mask and A color and their pseudo-inverse can be find in Sec. 3.2. Then we take A = A color A mask and A † = A mask † A color † for DDNM + , and set a proper σ y . From the results in Fig. 7 , we can see that when setting σ y = 0, the noise is fully inherited by the results. By setting σ y = 0.1, the noise is removed, and the identity is well preserved. When we set higher σ y = 0.25, the results becomes much smoother but yield relatively poor identity consistency. The choice of σ y is critical to achieve the best balance between realness and consistency. But for now we can only rely on manual estimates.

E PYTORCH-LIKE CODE IMPLEMENTATION

Here we provide a basic PyTorch-Like implementation of DDNM + . Readers can quickly implement a basic DDNM + on their own projects by referencing Algo. 2 and Sec. 3.3 and the code below. Inpainting. We use text masks, random pixel-wise masks, and hand-drawn masks for inpainting experiments. Fig.8(d) demonstrates examples of different masks. Deblurring. For deblurring experiments, We use three typical kernels to implement blurring operations, including Gaussian blur kernel, uniform blur kernel, and anisotropic blur kernel. For Gaussian blur, the kernel size is 5 and kernel width is 10; For uniform blur kernel, the kernel size is 9; For anisotropic blur kernel, the kernel size is 9 and the kernel widths of each axis are 20 and 1. Fig. 8(c ) demonstrates the effect of these kernels.

Compressed Sensing (CS).

For CS experiments, we choose two types of sampling matrices: one is based on the Walsh-Hadamard transformation, and the other is an orthogonalized random matrix applied to the original image block-wisely. For the Walsh-Hadamard sampling matrix, we choose 50% and 25% as the sampling ratio. For the orthogonalized sampling matrix, we choose ratios from 40% to 5%. Fig. 8 (e) and (f) demonstrate the effects of the Walsh-Hadamard sampling matrix and orthogonalized sampling matrix with different CS ratios. Colorization. For colorization, we choose the degradation matrix A = 1 3 1 3 1 3 for each pixel as we described in Sec. 3.2. Fig. 8 (g) demonstrates the example of colorization degradation. Solve the Pseudo-Inverse Using SVD Considering we have a linear operator A, we need to compute its pseudo-inverse A † to implement the algorithm of the proposed DDNM. For some simple degradation like and SR based on average pooling, the pseudo-inverse A † can be constructed manually, which has been discussed in Sec. 3.2. For general cases, we can use the singular value decomposition (SVD) of A(= UΣV ⊤ ) to compute the pseudo-inverse A † (= VΣ † U ⊤ ) where Σ and Σ † have the following relationship: Σ = diag{s 1 , s 2 , • • • }, Σ † = diag{d 1 , d 2 , • • • }, d i = 1 si s i ̸ = 0 0 s i = 0 , where s i means the i-th singular value of A and d i means the i-th diagonal element of Σ † .

G VISUALIZATION OF THE INTERMEDIATE RESULTS

In Fig. 9 , we visualize the intermediate results of DDNM on 4× SR, 16× SR, and deblurring. Specifically, we show the noisy result x t , the clean estimation x 0|t , and the rectified clean estimation x0|t . The total diffusion step is 1000. From Fig. 9 (a), we can see that due to the fixed range-space contents A † y, x0|t already owns meaningful contents in early stages while x t and x 0|t contains limited information. But when t = 0, we can observe that x 0|0 contains much more details than A † y. These details are precisely the null-space contents. We may notice a potential speed-up trick here. For example, we can replace x 0|t=100 with A † y and start DDNM directly from t = 100, which yields a 10 times faster sampling. We leave it to future work. From Fig. 9 (b), we can see that the reverse diffusion process gradually restores images from low-frequency contours to high-frequency details. 

H COMPARING DDNM WITH RECENT DIFFUSION-BASED IR METHODS

Here we provide detailed comparison between DDNM and recent diffusion-based IR methods, including RePaint (Lugmayr et al., 2022) , ILVR (Choi et al., 2021) , DDRM (Kawar et al., 2022) , SR3 (Saharia et al., 2021) and SDE (Song et al., 2020) . For easier comparison, we rewrite their algorithms based on DDPM (Ho et al., 2020) and follow the characters used in DDNM. Algo. 3, Algo. 4 show the reverse diffusion process of DDPM and DDNM. We mark in blue those that are most distinct from DDNM. All the IR problems discussed here can be formulated as y = Ax + n, where y, A, x, n represents the degraded image, the degradation operator, the original image, and the additive noise, respectively. H.1 REPAINT AND ILVR. RePaint (Lugmayr et al., 2022) solves noise-free image inpainting problems, where n = 0 and A represents the mask operation. RePaint first create a noised version of the masked image y y t-1 = A( √ ᾱt-1 y + 1 -ᾱt-1 ϵ), ϵ ∼ N (0, I). Then uses y t-1 to fill in the unmasked regions in x t-1 : x t-1 = y t-1 + (I -A)x t-1 , Besides, RePaint applies an "back and forward" strategy to refine the results. Algo. 5 shows the algorithm of RePaint. ILVR (Choi et al., 2021) focuses on reference-based image generation tasks, where n = 0 and A represents a low-pass filter defined by A = A 1 A 2 (A 1 is a bicubic upsampler and A 2 is a bicubic downsampler). ILVR creates a noised version of the reference image x and uses the low-pass filter A to extract its low-frequency contents: y t-1 = A( √ ᾱt-1 x + 1 -ᾱt-1 ϵ), ϵ ∼ N (0, I). Then combines the high-frequency part of x t-1 with the low-frequency contents in y t-1 : x t-1 = y t-1 + (I -A)x t-1 , Algo. 6 shows the algorithm of ILVR. Essentially, RePaint and ILVR share the same formulations, with different definitions of the degradation operator A. DDNM differs from RePaint and ILVR mainly in two parts: (i) Operating on Different Domains. RePaint and ILVR all operate on the noisy x t domain of diffusion models, which is inaccurate in range-space preservation during the reverse diffusion process. Instead, we directly operate on the noise-free x 0|t domain, which does not need extra process on y and is strictly derived from the theory and owns strict data consistency. (ii) As Special Cases. Aside from the difference in operation domain, Eq. 24 of RePaint is essentially a special case of the range-null space decomposition. Considering A as a mask operator, it satisfies AAA = A, so we can use A itself as the pseudo-inverse A † . Hence the range-null space decomposition becomes x = A † y + (I -A † A)x = Ay + (I -AA)x = y + (I -A)x, which is exactly the same as Eq. 24. Similarly, Eq. 26 of ILVR can be seen as a special case of range-null space decomposition, which uses I as the approximation of A † . Note that the final result x 0 of RePaint satisfies Consistency, i.e., Ax 0 ≡ y, while ILVR does not because the pseudo-inverse A † they used is inaccurate. Algorithm 3 Reverse Diffusion Process of DDPM Require: None 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: ϵ ∼ N (0, I) if t > 1, else ϵ = 0. 4: x t-1 = 1 √ α t x t -Z θ (x t , t) βt √ 1-ᾱt + σ t ϵ 5: return x 0 Algorithm 4 Reverse Diffusion Process of DDNM Based On DDPM Require: The degraded image y, the degradation operator A and its pseudo-inverse A † 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: ϵ ∼ N (0, I) if t > 1, else ϵ = 0. 4: x 0|t = 1 √ ᾱt x t -Z θ (x t , t) √ 1 -ᾱt 5: x0|t = x 0|t -A † (Ax 0|t -y) 6: x t-1 = √ ᾱt-1βt 1-ᾱt x0|t + √ αt(1-ᾱt-1) 1-ᾱt x t + σ t ϵ 7: return x 0 Algorithm 5 Reverse Diffusion Process of RePaint Require: The masked image y, the mask A 1: x T ∼ N (0, I). for t = T, ..., 1 do 3: for s = 1, ..., S t do 4: ϵ 1 , ϵ 2 ∼ N (0, I) if t > 1, else ϵ 1 , ϵ 2 = 0. 5: y t-1 = √ ᾱt-1 y + √ 1 -ᾱt-1 ϵ 1 6: x t-1 = 1 √ α t x t -Z θ (x t , t) βt √ 1-ᾱt + σ t ϵ 2 7: x t-1 = y t-1 + (I -A)x t-1 8: if t ̸ = 0 and s ̸ = S t then 9: x t = √ 1 -β t x t-1 + √ β t ϵ 2 10: return x 0 Algorithm 6 Reverse Diffusion Process of ILVR Require: The reference image x, the low-pass filter A 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: ϵ 1 , ϵ 2 ∼ N (0, I) if t > 1, else ϵ 1 , ϵ 2 = 0. 4: y t-1 = A( √ ᾱt-1 x + √ 1 -ᾱt-1 ϵ 1 ) 5: x t-1 = 1 √ α t x t -Z θ (x t , t) βt √ 1-ᾱt + σ t ϵ 2 6: x t-1 = y t-1 + (I -A)x t-1 7: return x 0 Algorithm 7 Reverse Diffusion Process of DDRM Require: The degraded image y with noise level σ y , the operator A = UΣV ⊤ , A ∈ R d×D 1: x T ∼ N (0, I). 2: ȳ = Σ † U ⊤ y 3: for t = T, ..., 1 do 4: ϵ ∼ N (0, I) if t > 1, else ϵ = 0. 5: x0|t = V ⊤ 1 √ ᾱt x t -Z θ (x t , t) √ 1 -ᾱt 6: for i = 1, ..., D do 7: if s i = 0 then 8: x(i) t-1 = x(i) 0|t + 1 -η 2 σ t-1 x(i) t -x (i) 0|t σt + ησ t-1 ϵ (i) 9: else if σ t-1 < σy si then 10: x(i) t-1 = x(i) 0|t + 1 -η 2 σ t-1 ȳ(i) -x (i) 0|t σy/si + ησ t-1 ϵ (i) 11: else if σ t-1 ≥ σy si then 12: x(i) t-1 = ȳ(i) + σ 2 t-1 - σ 2 y s 2 i ϵ (i) 13: x t-1 = Vx t-1 14: return x 0

H.2 DDRM

The forward diffusion process defined by DDRM is x t = x 0 + σ t ϵ, ϵ ∼ N (0, I) The original reverse diffusion process of DDRM is based on DDIM, which is x t-1 = x 0 + 1 -η 2 σ t-1 x t -x 0 σ t + ησ t-1 ϵ For noisy linear inverse problem y = Ax + n where n ∼ N (0, σ 2 y ), DDRM first uses SVD to decompose A as UΣV ⊤ , then use ȳ = Σ † U ⊤ y and x0|t = V ⊤ x 0|t for derivation. Each element in ȳ and x0|t corresponds to a singular value in Σ(the nonexistent singular value is defined as 0), hence it is possible to modify x 0|t element-wisely according to each singular value. Then one can yield the final result x 0 by x 0 = Vx 0 . Algo. 7 describes the whole reverse diffusion process of DDRM. For noise-free(σ y = 0) situation, the final result x 0 of DDRM is essentially yielded through a special range-null space decomposition. Specifically, when t = 0 and σ y = 0, we can rewrite the formula of the i-th element of x0 as: x(i) 0 = x(i) 0|1 , s i = 0 ȳ(i) , s i ̸ = 0 (29) To simplify the representation, we define a diagonal matrix Σ 1 : Σ (i) 1 = 0, s i = 0 1, s i ̸ = 0 Then we can rewrite x0 as x0 = Σ 1 ȳ + (I -Σ 1 )x 0|1 (31) and yield the result x 0 by left multiplying V: x 0 = Vx 0 = VΣ 1 ȳ + V(I -Σ 1 )x 0|1 (32) This result is essentially a special range-null space decomposition: x 0 = VΣ 1 ȳ + V(I -Σ 1 )x 0|1 = VΣ 1 Σ † U ⊤ y + V(I -Σ 1 )V ⊤ x 0|1 = VΣ † U ⊤ y + (I -VΣ 1 V ⊤ )x 0|1 = A † y + (I -A † A)x 0|1 Now we can clearly see that VΣ 1 ȳ = A † y is the range-space part while V(I -Σ 1 )x 0|1 = (I -A † A)x 0|1 is the null-space part. However for our DDNM, A † can be any linear operator as long as it satisfies AA † A ≡ A, where A † = VΣ † U ⊤ is a special case. Due to the calculation needs of SVD, DDRM needs to convert the operator A into matrix form. However, common operations in computer vision are in the form of convolution, let alone A as a compound or high-dimension one. For example, DDRM is difficult to handle old photo restoration. Rather, our DDNM supports any linear forms of operator A and A † , as long as AA † A = A is satisfied. It is worth mentioning that there exist diverse ways of yielding the pseudo-inverse A † , and SVD is just one of them. Besides, DDNM is more concise than DDRM in the formulation and performs better in noise-free IR tasks.

H.3 OTHER DIFFUSION-BASED IR METHODS

SR3 (Saharia et al., 2021 ) is a task-specific super-resolution method which trains a denoiser with y as an additional input, i.e., Z θ (x t , t, y). Then follow the similar reverse diffusion process in DDPM (Ho et al., 2020) to implement image super-resolution, as is shown in Algo. 8. SR3 needs to modify the network structures to support extra input y and needs paired data to train the conditional denoiser Z θ (x t , t, y), while our DDNM is free from those burdens and is fully zero-shot for diverse IR tasks. Besides, DDNM can be also applied to SR3 to improve its performance. Specifically, we insert the core process of DDNM, the range-null space decomposition process, into SR3, yielding Algo.9. Results are demonstrated in Fig. 10 . We can see that the range-null space decomposition can improve the restoration quality by ensuring data consistency. Algorithm 8 Reverse Diffusion Process of SR3 Require: The degraded image y 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: ϵ ∼ N (0, I) if t > 1, else ϵ = 0. 4: x t-1 = 1 √ α t x t -Z θ (x t , t, y) βt √ 1-ᾱt + σ t ϵ 5: return x 0 Algorithm 9 Reverse Diffusion Process of SR3+DDNM Require: The degraded image y 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: ϵ ∼ N (0, I) if t > 1, else ϵ = 0. 4: x 0|t = 1 √ ᾱt x t -Z θ (x t , t, y) √ 1 -ᾱt 5: x0|t = x 0|t -A † (Ax 0|t -y) 6: x t-1 = √ ᾱt-1βt 1-ᾱt x0|t + √ αt(1-ᾱt-1) 1-ᾱt x t + σ t ϵ 7: return x 0 Algorithm 10 Reverse Diffusion Process of SDE (conditional) Require: The condition y, the operator A and the rate λ 1: x T ∼ N (0, I). 2: for t = T, ..., 1 do 3: 2020) propose a conditional sampling strategy in diffusion models, which we abbreviate as SDE in this paper. Specifically, SDE optimize each latent variable x t toward a specific condition f (Ax t , y) and put the optimized x t back to the original reverse diffusion process, as is shown in Algo. 10. y is the condition and A is an operator with f (•, •) measures the distance between Ax t and y. ϵ 1 , ϵ 2 ∼ N (0, I) if t > 1, else ϵ 1 , ϵ 2 = 0. 4: xt = x t + λ∇ xt f (Ax t , y) 5: x t-1 = 1 √ α t xt -Z θ (x t , t) It is worth noting that DDNM is compatible with extra sources of constraints in the form of operation 5 in Algo. 10. For example, our results in Fig. 1 and Fig. 3 are generated using the diffusion model pretrained on ImageNet with classifier guidance.

I SOLVING NOISY IMAGE RESTORATION PRECISELY

For noisy tasks y = Ax + n, n ∼ N (0, σ 2 y I), Sec. 3.3 provide a simple solution where A † n is approximated as N (0, σ 2 y I). However, the precise distribution of A † n is N (0, σ 2 y A † (A † ) T ) where the covariance matrix is usually non-diagonal. To use similar principles in Eq. 19, we need to orthodiagonalize this matrix. Next, we conduct detailed derivations. This solution involves the Singular Value Decomposition(SVD), which can decompose the degradation operator A and yield its pseudo-inverse A † : A = UΣV ⊤ , A † = VΣ † U ⊤ , A ∈ R d×D , A † ∈ R D×d , U ∈ R d×d , V ∈ R D×D , Σ ∈ R d×D , Σ † ∈ R D×d , Σ = diag{s 1 , s 2 , • • • , s d }, Σ (i) = s i , Σ †(i) = 1 si , s i ̸ = 0, 0, s i = 0 , To find out how much noise has been introduced into x0|t , we first rewrite Eq. 17 as: x0|t = x 0|t -Σ t A † (Ax 0|t -Ax -n), where Ax represents the clean measurements before adding noise. Σ t = VΛ t V ⊤ is the scaling matrix with Λ t = diag{λ t1 , λ t2 , • • • , λ tD }. Then we can rewrite the additive noise n as σ y ϵ n where ϵ n ∼ N (0, I). Now Eq. 37 becomes x0|t = x 0|t -Σ t A † (Ax 0|t -Ax) + σ y VΛ t V ⊤ A † ϵ n , where x 0|t -Σ t A † (Ax 0|t -Ax) denotes the clean part of x0|t (written as xc 0|t ). It is clear that the noise introduced into x0|t is σ y VΛ t V ⊤ A † ϵ n . The handling of the introduced noise depends on the sampling strategy we used. We will discuss the solution for DDPM and DDIM, respectively. The Situation in DDPM. When using DDPM as the sampling strategy, we yield x t-1 by sampling from p(x t-1 |x t , x 0 ) = N (x t-1 ; µ t (x t , x 0 ), σ 2 t I), i.e., x t-1 = √ ᾱt-1 β t 1 -ᾱt x0|t + √ α t (1 -ᾱt-1 ) 1 -ᾱt x t + σ t ϵ, ϵ ∼ N (0, I), Considering the introduced noise, we change σ t ϵ to ensure the entire noise level not exceed N (0, σ 2 t I). Hence we construct a new noise ϵ new ∼ N (0, Φ t I). Then the Eq. 39 becomes x t-1 = √ ᾱt-1 β t 1 -ᾱt xc 0|t + √ α t (1 -ᾱt-1 ) 1 -ᾱt x t + ϵ intro + ϵ new , ϵ intro = √ ᾱt-1 β t 1 -ᾱt σ y VΛ t V ⊤ A † ϵ n , ϵ intro + ϵ new ∼ N (0, σ 2 t I). ( ) ϵ intro denotes the introduced noise, which can be further written as ϵ intro = √ ᾱt-1 β t 1 -ᾱt σ y VΛ t V ⊤ A † ϵ n (43) ∼ N (0, ( √ ᾱt-1 β t 1 -ᾱt ) 2 σ 2 y (VΛ t V ⊤ A † )I(VΛ t V ⊤ A † ) ⊤ ) ∼ N (0, ( √ ᾱt-1 β t 1 -ᾱt ) 2 σ 2 y VΛ t V ⊤ A † (A † ) ⊤ VΛ t V ⊤ ) ∼ N (0, ( √ ᾱt-1 β t 1 -ᾱt ) 2 σ 2 y VΛ t V ⊤ VΣ † U ⊤ U(Σ † ) ⊤ V ⊤ VΛ t V ⊤ ) ∼ N (0, ( √ ᾱt-1 β t 1 -ᾱt ) 2 σ 2 y VΛ t Σ † (Σ † ) ⊤ Λ t V ⊤ ) The variance matrix of ϵ intro can be simplified as VD t V ⊤ , with D t = diag{d t1 , d t2 , • • • , d tD }: ϵ intro ∼ N (0, VD t V ⊤ ), d ti =    ( √ ᾱt-1 β t 1-ᾱt ) 2 σ 2 y λ 2 ti s 2 i , s i ̸ = 0, 0, s i = 0 , To construct ϵ new , we define a new diagonal matrix Γ t (= diag{γ t1 , γ t2 , • • • , γ tD }): Γ t = σ 2 t I -D t , γ ti =    σ 2 t - ( √ ᾱt-1 β t 1-ᾱt ) 2 σ 2 y λ 2 ti s 2 i , s i ̸ = 0, σ 2 t , s i = 0 , Now we can yield ϵ new by sampling from N (0, VΓ t V ⊤ ) to ensure that ϵ intro + ϵ new ∼ N (0, V(D t + Γ t )V ⊤ ) = N (0, σ 2 t I). An easier implementation method is firstly sampling ϵ temp from N (0, Γ t ) and finally get ϵ new = Vϵ temp . From Eq. 49, we also observe that λ ti guarantees the noise level of the introduced noise do not exceed the pre-defined noise level σ t so that we can get the formula of λ ti in Σ t (= VΛ t V ⊤ , Λ t = diag{λ t1 , λ t2 , • • • , λ tD }}): λ ti =            1, σ t ≥ ( √ ᾱt-1 β t 1-ᾱt )σy si σtsi ( √ ᾱt-1 β t 1-ᾱt )σy , σ t < ( √ ᾱt-1 β t 1-ᾱt )σy si 1, s i = 0 , The Situation in DDIM. When using DDIM as the sampling strategy, the process of getting x t-1 from x t becomes: x t-1 = √ ᾱt-1 x0|t + σ t 1 -η 2 Z θ (x t , t) + σ t ηϵ, ϵ ∼ N (0, I), J ADDITIONAL RESULTS We present additional quantitative results in Tab. 5, with corresponding visual results of DDNM in Fig. 11 and Fig. 12 . Additional visual results of DDNM + are shown in Fig. 13 and Fig. 14 . Additional results for real-world photo restoration are presented in Fig. 15 . Note that all the additional results presented here do not use the time-travel trick. 



r e t u r n x . r e p e a t( 1 , 3 , 1 , 1 )7 d e f g r a y 2 c o l o r ( x ) :x = x [ : , 0 , : , : ] r e t u r n x 0 t , gamma t



Figure 1: We use DDNM + to solve various image restoration tasks in a zero-shot way. Here we show some of the results that best characterize our method, where y is the input degraded image and x 0 represents the restoration result. Part (a) shows the results of DDNM + on image super-resolution (SR) from scale 2× to extreme scale 256×. Note that DDNM + assures strict data consistency. Part (b) shows multiple results of DDNM + on inpainting and colorization. Part (c) shows the results of DDNM + on SR with synthetic noise and colorization with real-world noise. Part (d) shows the results of DDNM + on old photo restoration. All the results here are yielded in a zero-shot way.

Figure 2: Illustration of (a) DDNM and (b) the time-travel trick.Algo. 1 and Fig.2(a) show the whole reverse diffusion process of DDNM. For ease of understanding, we visualize the intermediate results of DDNM in Appendix G. By using a denoising network Z θ pre-trained for general generative purposes, DDNM can solve IR tasks with arbitrary forms of linear degradation operator A. It does not need task-specific training or optimization and forms a zero-shot solution for diverse IR tasks.

Fig. 2(b) illustrates the basic time-travel trick.

Fig. 4(b)   and the right part in Tab. 4 demonstrate the improvements that the time-travel trick brings.

Figure 3: Qualitative results of zero-shot IR methods.

3.2. We set σ y by observing the noise level of y. The results are shown in Fig. 6, Fig. 7, and Fig. 1(c). (2) Old Photo Restoration. For old photos, we construct A and A † as described in Sec 3.2, where we manually draw a mask for damaged areas on the photo. The results are shown in Fig. 1(d), and Fig. 15.

RANGE-NULL SPACE DECOMPOSITION IN IMAGE INVERSE PROBLEMSSchwab et al. (2019) first proposes using a DNN to learn the missing null-space contents in image inverse problems and provide detailed theory analysis.Chen & Davies (2020) proposes learning the range and null space respectively.Bahat & Michaeli (2020) achieves editable super-resolution via exploring the null-space contents.Wang et al. (2023) apply range-null space decomposition to existing GAN prior based SR methods to improve their performance and convergence speed.

Figure5: DDNM + can well handle 4× SR even with a strong noise σ y =0.9.

Figure 6: Solving JPEG-like artifacts using DDNM + . Here we set A = I to exert a pure denoising. y denotes the input degraded image. When we set σ y = 0.1, the artifacts are decently removed.When we set σ y = 0.2, the results become smoother but yield relatively poor identity consistency.

[ : , 0 , : , : ] * c o e f + x [ : , 1 , : , : ] * c o e f + x [ : , 2 , : , : ] * c o e f c o e f = 1 / 3 b a s e = c o e f ** 2 + c o e f ** 2 + c o e f ** 2 11 r e t u r n t h . s t a c k ( ( x * c o e f / b a s e , x * c o e f / b a s e , x * c o e f / b a s e ) , 1 ) 13 d e f P a t c h U p s a m p l e ( x , s c a l e ) : n , c , h , w = x . s h a p e 15 x = t o r c h . z e r o s ( n , c , h , s c a l e , w, s c a l e ) + x . view ( n , c , h , 1 , w, 1 ) r e t u r n x . view ( n , c , s c a l e * h , s c a l e * w) # I m p l e m e n t a t i o n o f A and i t s pseudo -i n v e r s e Ap 19 i f IR mode == " c o l o r i z a t i o n " : 21 A = c o l o r 2 g r a y Ap = g r a y 2 c o l o r 23 e l i f IR mode == " i n p a i n t i n g " : 25 A = lambda z : z * mask Ap = A e l i f IR mode == " s u p e r r e s o l u t i o n " : 29 A = t o r c h . nn . A d a p t i v e A v g P o o l 2 d ( ( 2 5 6 / / s c a l e , 2 5 6 / / s c a l e ) ) Ap = lambda z : P a t c h U p s a m p l e ( z , s c a l e ) e l i f IR mode == " o l d p h o t o r e s t o r a t i o n o r c h . nn . A d a p t i v e A v g P o o l 2 d ( ( 2 5 6 / / s c a l e , 2 5 6 / / s c a l e ) ) A3p = lambda z : P a t c h U p s a m p l e ( z , s c a l e ) DETAILS OF THE DEGRADATION OPERATORS Super Resolution (SR). For SR experiments in Tab. 1, we use the bicubic downsampler as the degradation operator to ensure fair comparisons. For other cases in this paper, we use the averagepooling downsampler as the degradation operator, which is easy to get the pseudo-inverse as described in Sec. 3.2. Fig. 8(a) and Fig. 8(b) show examples of the bicubic operation and the averagepooling operation.

Figure 8: Visualization of different degradation operators. (a) Bicubic downsampler. The scale factors from left to right are ×4, ×8, ×16, ×32; (b) Average-pooling downsampler. The scale factors from left to right are ×4, ×8, ×16, ×32; (c) Blur operators. The type of kernels from left to right are Gaussian, uniform, and anisotropic; (d) Masks; (e) Walsh-Hadamard sampling matrix. The sampling ratios from left to right are 0.5 and 0.25; (f) Block-based sampling matrix. The sampling ratios from left to right are 0.4, 0.3, 0.2, 0.1, 0.05; (g) Grayscale operator.

Visualization of DDNM on 4×SR, with DDPM as the sampling strategy V isualization of DDNM on 16×SR and deblurring, with DDIM as the sampling strategy

Figure 9: Visualization of the intermediate results in DDNM. Zoom-in for the best view.

Figure 10: DDNM can be applied to SR3 to improve the restoration performance. Here we experiment on 8× SR (from image size 16×16 to 128×128), the metrics are PSNR/Consistency.



Tao Yang, Peiran Ren, Xuansong Xie, and Lei Zhang. Gan prior embedded network for blind face restoration in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Jian Zhang and Bernard Ghanem. Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018.

, our DDNM does not necessarily need SVD. As is presented in Section 3.2, we construct A and A † for colorization, inpainting, and superresolution problems by hand, which bring negligible computation and memory consumption. In contrast, SVD-based methods suffer heavy cost on memory and computation if A has a high dimension (e.g., 128xSR, as shown below).Experiments in Tab. 3 well support these claims.

These claims are well supported by experiments in Tab. 4.

Comprehensive quantitative comparisons between DDNM and DDRM.

funding

* Equal contribution. work was supported in part by Shenzhen Research Project under Grant JCYJ20220531093215035 and Grant JSGGZD20220822095800001.

annex

where σ t = √ 1 -ᾱt-1 is the noise level of the t-th time-step, Z θ is the denoiser which estimates the additive noise from x t and η control the randomness of this sampling process. Considering the noise part is subject to a normal distribution, that is, σ t 1 -η 2 Z θ (x t , t) + σ t ηϵ ∼ N (0, σ 2 t I), so that the equation can be rewritten asConsidering the introduced noise, we change ϵ orig to ensure the entire noise level not exceed N (0, σ 2 t I). Hence we construct a new noise term ϵ new ∼ N (0, Φ t I):ϵ intro denotes the introduced noise, which can be further written asThe variance matrix of ϵ intro can be simplified asTo construct ϵ new , we define a new diagonal matrix Γ t (= diag{γ t1 , γ t2 , • • • , γ tD }):Now we can construct ϵ new by sampling from N (0, VΓ t V ⊤ ) to ensure that). An easier implementation is firstly sampling ϵ temp from N (0, Γ t ) and finally get ϵ new = Vϵ temp . From Eq. 62, we also observe that λ ti guarantees the noise level of the introduced noise do not exceed the pre-defined noise level σ t so that we can get the formula ofIn the actual implementation, we have adopted the following formula for ϵ temp and it can be proved that its distribution is N (0, Γ t ):temp denotes the i-th element of the vector ϵ temp and ϵ ∼ N (0, I). Note that the blue η is not necessarily needed. By our theory in Sec. 3.3, η should be 0 to maximize the preservation of range-space correction. But inspired by DDRM (Kawar et al., 2022) , we find that involving η help improves the robustness, though sacrificing some range-space information. 

