PSEUDOINVERSE-GUIDED DIFFUSION MODELS FOR INVERSE PROBLEMS

Abstract

Diffusion models have become competitive candidates for solving various inverse problems. Models trained for specific inverse problems work well but are limited to their particular use cases, whereas methods that use problem-agnostic models are general but often perform worse empirically. To address this dilemma, we introduce Pseudoinverse-guided Diffusion Models (ΠGDM), an approach that uses problem-agnostic models to close the gap in performance. ΠGDM directly estimates conditional scores from the measurement model of the inverse problem without additional training. It can address inverse problems with noisy, non-linear, or even non-differentiable measurements, in contrast to many existing approaches that are limited to noiseless linear ones. We illustrate the empirical effectiveness of ΠGDM on several image restoration tasks, including super-resolution, inpainting and JPEG restoration. On ImageNet, ΠGDM is competitive with state-of-the-art diffusion models trained on specific tasks, and is the first to achieve this with problem-agnostic diffusion models. ΠGDM can also solve a wider set of inverse problems where the measurement processes are composed of several simpler ones.



. ΠG converts the problem-agnostic score function into a problem-specific one, using information about the measurements y and measurement model, denoted as h here (h is JPEG compression + masking in this figure, best viewed zoomed in). The additional guidance term is a vector-Jacobian product (VJP) that encourages consistency between the denoising result and the measurements, after a pseudoinverse transformation h † . ΠGDM applies the denoising process from ΠG in an iterative fashion to generate valid solutions to the inverse problem. Diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2021c) have been successfully applied to various applications such as text-to-image generation (Rombach et al., 2022; Saharia et al., 2022b) , natural language generation (Li et al., 2022) , audio synthesis (Kong et al., 2020) , and time series modeling (Tashiro et al., 2021) . The ability to model complex, high-dimensional distributions also makes diffusion models strong candidates for solving inverse problems, where the goal is to infer the underlying signal from measurements (Bora et al., 2017; Daras et al., 2021; Ongie et al., 2020; Kadkhodaie & Simoncelli, 2021) . Most methods that solve inverse problems with diffusion models fall into one of the two paradigms. In the first paradigm, one trains a problem-specific, conditional diffusion model that is limited to specific inverse problems, such as super-resolution (Saharia et al., 2021; Whang et al., 2021; Saharia et al., 2022a) . In the second paradigm, one uses problem-agnostic diffusion models that are trained for generative modeling but not train on any specific inverse problem; solutions are obtained via a "plug-and-play" approach that combines the diffusion model and the measurement process, e.g., via Bayes' rule (Venkatakrishnan et al., 2013a; Bardsley, 2012; Laumont et al., 2022; Choi et al., 2021; Song et al., 2021b; Jalal et al., 2021; Chung et al., 2021; Kawar et al., 2021; 2022a; Chung et al., 2022b; Daras et al., 2022a) . These methods can easily adapt to different tasks without re-training the diffusion model but tend to perform worse than problem-specific diffusion models. To achieve the best of both worlds, we introduce pseudoinverse guidance (ΠG), which uses problemagnostic diffusion models to reach the empirical performance of problem-specific ones. Conditioned on the measurements and an explicit measurement model, ΠG estimates the problem-specific score function via Bayes' rule and uses these scores to draw samples. However, unlike classifier/classifierfree guidance (Dhariwal & Nichol, 2021; Ho & Salimans, 2022) , ΠG obtains the problem-specific score directly via the known measurement model, without training additional models. Intuitively, ΠG guides the diffusion process by matching the one-step denoising solution and the ground-truth measurements, after transforming both via a "pseudoinverse" of the measurement model (see Fig. 1 ). This perspective allows ΠG to be the first guidance-based approach for inverse problem solving that handles measurements with Gaussian noise, as well as some non-linear, non-differentiable measurement models, such as JPEG compression (Kawar et al., 2022b) . We evaluate our method, termed Pseudoinverse-Guided Diffusion Models (ΠGDM), on various inverse problems, such as super-resolution, inpainting, and JPEG restoration over ImageNet validation images, and show that it achieves similar performance when compared against state-of-the-art taskspecific diffusion models (Saharia et al., 2021; Dhariwal & Nichol, 2021; Saharia et al., 2022a) . To the best of our knowledge, ΠGDM is the first approach based on problem-agnostic models to achieve this quality on ImageNet. We further apply ΠGDM to a wider range of inverse problems, where the measurement process is composed of different types of measurements. This allows us to easily solve a much wider set of problems, including ones have never been solved with diffusion models (see Fig. 2 ), such as low-resolution + JPEG compression + masking.

2. PRELIMINARIES: DIFFUSION MODELS

Let us denote the data distribution as p 0 (x 0 ) and define a family of distributions p t (x t ) by injecting i.i.d. Gaussian noise of standard deviation σ t to samples of p 0 (x), i.e., p t (x t |x 0 ) = N (x 0 , σ 2 t I). The standard deviation σ t is monotonically increasing with respect to time t ∈ [0, T ], with σ 0 = 0 and σ T being much larger than the standard deviation of the datafoot_0 . Samples from p t (x) can be simulated by the following family of stochastic differential equations (SDE), solving from t = T to t = 0 (Grenander & Miller, 1994; Karras et al., 2022; Zhang et al., 2022) : dx = -σt σ t ∇ x log p t (x)dt Probabilistic ODE -β t σ 2 t ∇ x log p t (x)dt + 2β t σ t dω t Langevin process , where ∇ x log p t (x) is the score function, ω t is the standard Wiener process, and β t is a function that describes the amount of stochastic noise injected in the process. If β t = 0 for all t, then Eq. 1 becomes an ordinary differential equation (ODE) (Anderson, 1982) . A common choice of β t is η σt /σ t , where η = 1 corresponds to the variance-exploding SDE (VE-SDE, Song et al. (2021c) ) and η = 0 corresponds to a version of denoising diffusion implicit models (DDIM, Song et al. (2021a) ). Various forms of SDEs used by diffusion models in the literature can be described with Eq. 1 with certain σ t and β t functions, up to a time-dependent scaling factor over x. Diffusion models, a.k.a. score-based generative models (Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2021c) , solve Eq. 1 with two key approximations. The distribution with the highest noise level, p T (x), is approximated with N (0, σ 2 T I), and the score function is approximated with a neural network ∇ x log p t (x) ≈ S θ (x; σ t ), trained with denoising score matching objectives (Vincent, 2011) . Then, samples are drawn from diffusion models by solving the ODE or SDE in Eq. 1, such as with Euler's method, Euler-Maruyama, and higher order ODE solvers (Lu et al., 2022; Karras et al., 2022; Zhang & Chen, 2022 ).

3. METHODS

Problem statement Suppose we have measurements y ∈ R m of some signal x 0 ∈ R n , such that y = Hx 0 + z, where H ∈ R n×m is the known measurement matrix (model), and z ∼ N (0, σ 2 y I) is an i.i.d. Gaussian noise vector with known dimension-wise standard deviation σ y . Our goal is to solve the inverse problem and recover x 0 ∈ R n from the measurements y. In later parts of the paper, we may consider inverse problems whose measurements are not linear, which we denote as y = h(x 0 ). Diffusion models can solve such inverse problems via Eq. 1, assuming that the problem-specific scores for all noise levels, i.e., ∇ xt log p t (x t |y), are available. While it is possible to train a conditional diffusion model for a specific H, it is computationally expensive to do this for a large family of problems, such as sparse reconstruction in medical imaging (Chung & Ye, 2022) . Therefore, we wish to utilize more commonly available problem-agnostic score models S θ (x; σ t ) that are not trained specifically for the target inverse problem. If ∇ xt log p t (x t |y) can be effectively approximated with S θ (x t ; σ t ), then we can directly plug it in Eq. 1 to solve the inverse problem.

3.1. APPROXIMATING THE PROBLEM-SPECIFIC SCORE FUNCTION

The problem-specific score can be decomposed via Bayes' rule: ∇ xt log p t (x t |y) = ∇ xt log p t (x t ) + ∇ xt log p t (y|x t ), where the first term can be approximated with the score network S θ (x t ; σ t ) (Vincent, 2011) , and the second term is a guidance term which is the score of p t (y|x t ). Unfortunately, the score ∇ xt log p t (y|x t ) is intractable to compute, and we have to resort to approximations to efficiently estimate it. To see why this is true, we consider the underlying graphical model for x 0 , x t , and y, which is y ← x 0 → x t ; x t is produced by adding independent Gaussian noise to x 0 , so it is independent of the measurement y when conditioned on x 0 . Therefore, we can write:  p t (y|x t ) = x0 p(x 0 |x t )p(y|x 0 )dx 0 , p t (x 0 |x t ) ≈ N (x t , r 2 t I), where the mean is obtained from Tweedie's formula: xt = E[x 0 |x t ] = x t + σ 2 t ∇ xt log p t (x t ) ≈ x t + σ 2 t S θ (x; σ t ). Eq. 5 represents the minimum mean squared error (MMSE) estimator of x 0 given x t and the noise standard deviation σ t (Stein, 1981; Efron, 2011; Saremi & Hyvärinen, 2019) , and r t is a timedependent standard deviation value that should depend on the data (see discussion in App. A.3). Our choice for the mean (MMSE) can be justified using an argument related to variational inference (App. A.6). Our next step is to approximate the score of p t (y|x t ). Since the measurement model obtains y by performing a linear transform on x 0 and adding independent Gaussian noise (Eq. 2), and p t (x 0 |x t ) is Gaussian under our approximation (Eq. 4), the distribution of y conditioned on x t is also Gaussian under our approximation, as follows: p t (y|x t ) ≈ N (H xt , r 2 t HH ⊤ + σ 2 y I). Thus, we have the following approximation to the scorefoot_1 : ∇ xt log p t (y|x t ) ≈ (y -H xt ) ⊤ r 2 t HH ⊤ + σ 2 y I -1 H vector ∂ xt ∂x t Jacobian ⊤ . This is a vector-Jacobian product and can be computed with backpropagation.

3.2. EXTENDING TO NON-LINEAR OPERATORS

In many cases, we have that σ y = 0, and thus, Eq. 7 can be simplified to: ∇ xt log p t (y|x t ) ≈ r -2 t (H † y -H † H xt ) ⊤ ∂ xt ∂x t ⊤ ; where for a matrix with linearly independent rows, H † = H ⊤ (HH ⊤ ) -1 is the Moore-Penrose pseudoinverse of H. In this paper, we use the term pseudoinverse guidance (ΠG) to denote our guidance method, which uses Eq. 8 for noiseless measurements and Eq. 7 for noisy linear ones. Notably, we only need to perform automatic differentiation explicitly through the score model, but not through the computational graph with H or H † (see Listing 1 in App. A.1). This allows us to extend ΠG to measurements that are not necessarily linear or even differentiable. We note that the matrix pseudoinverse satisfies HH † Hx = Hx for all x ∈ X . Analogously, for some non-linear measurement function h : R n → R m , we may find another function h † : R m → R n such that h(h † (h(x))) = h(x) for all x ∈ R n , similar to Kawar et al. (2022b) . Two examples are as follows: Quantization Let h(x) = ⌊x⌋ be the element-wise floor function of x ∈ R n . Then we can define h † (x) := x for all x ∈ Z, and h(h † (h(x))) is still the floor function. JPEG encoding Let h(x) be the JPEG encoding function, where quantization occurs after a discrete cosine transform operation. The corresponding JPEG decoding algorithm does not modify the values produced after quantization, so we can simply define h † (x) as the JPEG decoding algorithm. This idea can also be applied to other measurement models, such as the formation of a low dynamic range image (details in App. A.5). The corresponding ΠG term would then become: ∇ xt log p t (y|x t ) ≈ r -2 t (h † (y) -h † (h(x t ))) ⊤ ∂ xt ∂x t ⊤ , which generalizes the linear case (Eq. 8) when h(x) = Hx and h † (x) = H † x for all x ∈ R n .

3.3. ADAPTIVE WEIGHTS IN GUIDED DIFFUSION MODELS

Similar to the guidance scalar in the classifier(-free) guidance literature (Dhariwal & Nichol, 2021; Ho & Salimans, 2022) , we introduce a scalar weight in front of the guidance term ∇ xt log p t (y|x t ). However, unlike most existing methods that apply a fixed weight for different diffusion times, we introduce a heuristic that implicitly adapts the guidance weights according to the timestep. We use f (x t ; s, t, η) to denote the one step update using the problem-agnostic score model from time t to times s (assuming s < t), using the sampler introduced in the DDIM paper (Song et al., 2021a) , with η ∈ [0, 1] being a hyperparameter (details in App. A.4). Our one-step sampling update from time t to time s with pseudoinverse guidance is: x s = f (x t ; s, t, η) + r 2 t ∇ xt log p t (y|x t ). If we use the noiseless case in Eq. 9, this becomes: x s = f (x t ; s, t, η) + (h † (y) -h † (h(x t ))) ⊤ ∂ xt ∂x t ⊤ . ( ) We describe the algorithm in Algorithm 1 (App. A.1). As the coefficients for the problem-agnostic score ∇ xt log p t (x t ) depend on the step t → s, this is equivalent to using the original DDIM sampler but adapting the weights to the pseudoinverse guidance term at different timesteps. To illustrate this, we can compare the ratio between the weights of our approach and the ones with w r = 1 in Ho et al. (2022) (see Fig. 6 ). Intuitively, our approach increases the weights during the initial sampling phase and then decreases it to one towards the end. We also compare our weights with the ones used in Ho et al. (2022) on image restoration problems, both of which use the pseudoinverse guidance with 100 diffusion steps and η = 0.2. On the super-resolution case (Fig. 7 ), our weights consistently produce sharp images. We further illustrate the advantages of our weights on JPEG restoration in App. A.4, where large, fixed weights that worked better in super-resolution could be unstable in another task. (Dhariwal & Nichol, 2021; Ho & Salimans, 2022) , we do not require training on pairs of (x t , y) (noisy data and measurements). Compared with reconstruction guidance (Ho et al., 2022; Chung et al., 2022b; Ryu & Ye, 2022) , ΠG has three advantages:

3.4. DIFFERENCES FROM EXISTING GUIDANCE METHODS

• Our approximation of p(x 0 |x t ) is consistent, i.e., it does not depend on the measurement model H. The same cannot be said for reconstruction guidance, which makes isotropic Gaussian assumptions on y (see App. A.2). • In reconstruction guidance, the pseudoinverse H † is replaced with matrix transpose H ⊤ (see App. A.2), which is different for linear H whose singular values are not all 0 or 1. • ΠG can be applied to noisy, non-linear, or non-differentiable measurement models, as discussed in Sec. 3.2. In cases like JPEG, it is easier to define a generalized notion of pseudoinverse than a generalized notion of transpose (or adjoint).

4. RELATED WORK

Deep neural networks have been extensively used as priors for solving inverse problems (Venkatakrishnan et al., 2013b) . Here, we focus on the setting where we can train models based on clean data but not on the problem, which is only known at inference time. This is reasonable in many real-world applications, such as medical imaging (Jalal et al., 2021; Chung & Ye, 2022) and JPEG restoration (Ehrlich et al., 2020) . These inverse problem solvers may use different types of neural networks, such as randomly initialized networks (Ulyanov et al., 2018) , denoisers (Romano et al., 2016) , robust classifiers (Santurkar et al., 2019) , and generative models (Bora et al., 2017) . Methods based on generative adversarial networks (GANs, Goodfellow et al. ( 2014)) search for the latent variables and/or the generator parameters that would produce images aligning with the measurements (Bora et al., 2017; Pan et al., 2021; Menon et al., 2020) ; these methods often require hundreds if not thousands of iterations, despite recent methods with improved efficiency (Daras et al., 2022a) . As another family of generative models, diffusion models are also used as inverse problem solvers, with two notable advantages over GANs: (i) it is trained with regression objectives over noisy data, so it can naturally deal with measurement noise without having to perform inversion like in GANs; (ii) its close connections to SDE/ODE solvers allow the use of more efficient iterative updates. In particular, Denoising Diffusion Restoration Models (DDRM, Kawar et al. (2022a) ) leverage both to derive efficient inverse problem solvers for both noisy and noiseless measurements. Similar to DDRM, many works adopt a "replacement" approach, where consistency with the measurements are enforced by replacing parts of its intermediate predictions from the one-step denoiser with the measurements, sometimes in a transformed space (Song et al., 2021c; Choi et al., 2021; Song et al., 2021b; Chung et al., 2021; Kawar et al., 2021) . Despite being successful in many tasks, they have trouble dealing with sparse measurements, where the replacements have weaker impact on the sampling process. By computing additional gradients through the diffusion model, ΠG allows the measurements to impact all the predicted values during the updates, regardless of sparsity. This is similar to reconstruction guidance, which also differentiates through the diffusion model during its updates (Ho et al., 2022; Ryu & Ye, 2022; Chung et al., 2022b) . In fact, ΠG is identical in the noiseless, linear case if the transpose of the measurement matrix is equal to its pseudoinverse (App. A.2). Nevertheless, ΠG introduces principled ways of dealing with noisy, non-linear, or even non-differentiable measurements, as discussed in Sec. 3.4.

5. EXPERIMENTS

Our approach, named Pseudoinverse-guided Diffusion Models (ΠGDM), combines ΠG (Eqs. 7 to 9) and the adaptive weight schedule (Eq. 10). While we use a sampler based on DDIM here, we note that other samplers can be used as well. We evaluate quantitative results on the ImageNet dataset (Russakovsky et al., 2015) with publicly available diffusion models trained on images of size 256 × 256, as there are extensive prior results with problem-specific diffusion models trained on ImageNet (Dhariwal & Nichol, 2021; Saharia et al., 2021; 2022a) . • First, we compare ΠGDM against problem-specific models on 4× super-resolution, inpainting, and JPEG restoration. Despite the "unfair" advantage held by problem-specific models, ΠGDM is on par with them in terms of performance. • Next, we perform an ablation study over the two components introduced in this paper. • Finally, we apply ΠGDM to inverse problems where the measurement process is composed of several steps, such as JPEG + super-resolution + inpainting, denoising + inpainting, etc.. The compositional nature of these problems makes it infeasible to train diffusion models for each problem and highlights the strength of ΠGDM.

5.1. QUANTITATIVE RESULTS

We consider two popular metrics, Frechet Inception Distance (FID, (Heusel et al., 2017) ) and Classifier Accuracy (CA) of a pre-trained ResNet50 model (He et al., 2015) . Unless specified otherwise, we use the noiseless version for pseudoinverse guidance (Eq. 8). We report super-resolution results on the full ImageNet validation set, and to follow the earlier practice established in Saharia et al. (2022a) , we report inpainting and JPEG restoration results on a subset that contains 10k imagesfoot_2 . The ImageNet models that we use are trained with 1000 discrete timesteps, corresponding to 1000 discrete noise levels (Dhariwal & Nichol, 2021) . For simplicity, we always use uniform spacing when we iterate the timesteps. Performance may further improve with better timestep scheduling, such as the one that iterates more frequently at lower noise levels (Karras et al., 2022) . We use 100 iterations and η = 1.0 for ΠGDM, and include additional task-specific details in App. B.

5.1.1. SUPER-RESOLUTION

We apply average pooling (Pool) and bicubic interpolation (Bicubic, which applies a convolution to the image) to produce two sets of 64 × 64 images, and then apply our 4× super-resolution algorithms to each. We consider both class-conditional (denoted as cc in Tab. 2) and class-unconditional models as the base generative model. In Tab. 2, we report results from ΠGDM and three other baselines: DDRM (Kawar et al. (2022a) ), SR3 (Saharia et al. (2021) ), and ADM-U (Dhariwal & Nichol (2021) ). DDRM uses task-agnostic models, whereas SR3 and ADM-U use diffusion models specifically for the 64 → 256 super-resolution problem. On Pool, ΠGDM significantly outperforms DDRM, while only being slightly worse than ADM-U; on Bicubic, ΠGDM outperforms all three baselines. Perhaps surprisingly, the ADM-U model performs much worse in Bicubic than Pool because it was trained on low-resolution images generated by average poolingfoot_3 (i.e., the Pool problem); when Bicubic images are used, the generated results become more blurry. This suggests that problem-specific diffusion models may fail to generalize beyond settings that they are trained on.

5.1.2. INPAINTING

We use the two types of inpainting masks used in (Saharia et al., 2022a) : the center 128 × 128 pixels (Center), and freeform masks simulating brushstrokes that contain roughly 20% -30% of the pixels in each image (Freeform). In addition, we report ΠGDM results over the noisy inpainting problem with i.i.d. Gaussian noise of σ y = 0.05 (the pixel intensity range is [0, 1]); the problem becomes harder as the model has to perform denoising and inpainting at the same time. For the noisy setting, we use Eq. 7 for ΠG. In Tab. 3, we report quantitative results on the two inpainting tasks, mainly comparing with Palette (Saharia et al., 2022a) , which trains a diffusion model specifically on the inpainting task. While ΠGDM achieves a slightly worse FID compared with Palette, it has a higher classifier accuracy in both cases. Moreover, ΠGDM suffers merely a small performance drop when applied to the more challenging denoising + inpainting task, demonstrating its robustness to noisy measurements. Methods based on reconstruction guidance, however, fail to perform denoising effectively, as there are no mechanisms to address measurement noise (see Fig. 4 ). '('8'2$' !90W!<99-!2 '$329;8<$;-32<-&!2$' '('8'2$' !90W!<99-!2 '$329;8<$;-32<-&!2$'

5.1.3. JPEG RESTORATION

We consider the three JPEG quality factors (QFs) used in Saharia et al. (2022a) , which are 5, 10, and 20. In Tab. 4, we report quantitative results on JPEG, where we compare against a regressionbased baseline and Palette, both of which are trained specifically for JPEG images with QFs ranging from 5 to 30. Compared with Palette, ΠGDM achieves a slightly worse FID (less than 0.6) but higher classifier accuracy on QFs 10 and 20. We note that the model used in ΠGDM has never seen any JPEG images compressed to these quality factors, demonstrating the strength of task-agnostic diffusion models.

5.2. ABLATION STUDIES

ΠGDM introduces two key components: pseudoinverse guidance (ΠG) for problem-specific score estimation and the adaptive guidance weight schedule for sampling (AW). To illustrate their effectiveness, we compare with alternative approaches. The alternative to ΠG is the reconstruction guidance, whereas the alternative to AW is the standard weight schedule set with w r ∈ {1, 2, 5} in Ho et al. ( 2022) (the w r with best performance is reported). We consider the uniform kernel deblurring (Deblur) and bicubic 4× super-resolution (Bicubic) tasks discussed in (Kawar et al., 2022a) . The measurement matrix for both tasks have varying singular values (see Fig. 5 in App. A.2), so ΠG and reconstruction guidance updates are quite different since H ⊤ ̸ = H † . From Tab. 5, we can see that methods that use ΠG achieves a significant improvement over reconstruction guidance, and switching to the pseudoinverse in the guidance term is critical. ΠG itself achieves superior performance with AW, illustrating the importance of having a good sampling algorithm along with the guidance term. We provide additional experimental details and further ablation studies on the number of iterations per sample and η in App. B.

5.3. INVERSE PROBLEMS WITH COMPOSED MEASUREMENTS

Finally, we discuss cases where the measurement process consists of several simpler measurements, leading to some applications such as JPEG restoration with super-resolution + inpainting, etc., where the compositional nature of the measurements makes it too expensive to train problem-specific diffusion models individually. Specifically, let h(x) = h 1 • h 2 . . . • h k (x) be a measurement model composed of k smaller measurements. For certain measurements, such as low-resolution filtering, JPEG, and masking, we can approximate h † (x) with h † k • . . . • h † 2 • h † 1 (x) , and then use ΠGDM with Eq. 9 directly. We illustrate some examples in Fig. 2 and Fig. 13 (App. B.3). To the best of our knowledge, many of these problems have not been solved with problem-agnostic diffusion models before (such as super-resolution + JPEG + inpainting).

6. DISCUSSIONS, LIMITATIONS, AND FUTURE WORK

In this paper, we introduced ΠGDM, an inverse problem solver using unconditional diffusion models. On various tasks, ΠGDM achieves competitive quality with conditional models while avoiding expensive problem-specific training. As a result, we can use problem-agnostic diffusion models to solve certain problems that would be cost-ineffective to address individually with conditional diffusion models, leading to a much wider set of applications. The ability to handle measurement noise also gives ΠGDM the potential to address certain real-world applications, such as MRI imaging with Gaussian noise (Sijbers & Den Dekker, 2004) . Despite having better restoration results than DDRM (Kawar et al., 2022a) , ΠGDM is slower, as each iteration costs more memory and compute due to the vector-Jacobian product over the score model. Therefore, it would be helpful to explore more efficient sampling techniques. It is also interesting to investigate if similar ideas as ΠG can be used for diffusion models that do not directly operate on the data space (Vahdat et al., 2021; Rombach et al., 2022; Sinha et al., 2021) , or are based on alternative forward diffusion models (Jing et al., 2022; Rissanen et al., 2022; Daras et al., 2022b; Hoogeboom & Salimans, 2022) .

Reproducibility statement

We have made the following efforts to facilitate reproducibility of our work. (i) Our experiments are conducted on publicly available datasets and model checkpointsfoot_4 (Sec. 5). (ii) We include a detailed description of our algorithm in Algorithm 1. (iii) We discuss all the key hyperparameters and evaluation metrics to reproduce our experiments in Sec. 3.3 and App. B. (iv) We provide more explanations to some statements in the main paper in App. A.2 to A.5.

A.1 ALGORITHM DETAILS

We illustrate a PyTorch-like implementation for computing the pseudoinverse for the noiseless case in Listing 1. For a different inverse problem, we only need to change the definitions for functions H and H pinv. In practice, many diffusion model architectures adopt the Variance-Preserving (VP) SDE, which scales the signal x 0 down as the noise level increases, specifically: x t = x 0 + σ t ϵ (VE) ⇐⇒ xt = √ α t x 0 + √ 1 -α t ϵ (VP). ( ) To adjust for this in ΠGDM, we need to scale the guidance term by √ α t (Dhariwal & Nichol, 2021) . We list the full algorithm for ΠGDM for VP-SDE in Algorithm 1. # "H_pinv": (b, m) -> (b, n), "H": (b, n) -> (b, m) are functions over batches # "y" has shape (b, m); "x_t" has shape (b, n) # "hatx_t" with shape (b, n) is the solution from the denoiser hatx_t = denoise(x_t, sigma_t) # Compute the fixed coefficient; "mat" has shape (b, n) mat = H_pinv(y) -H_pinv(H(hatx_t)) # Compute the inner product between "hatx_t" and "mat", and then sum over batch mat_x = (mat.detach() * hatx_t).sum() # Compute the guidance term (without r_t). guidance = torch.autograd.grad(mat_x, x_t)[0] Listing 1: Pseudocode for computing the pseudoinverse guidance for the noiseless case. Algorithm 1 ΠGDM for VP-SDE. Inputs: y, h(x) (noiseless) or H, σ y (noisy), x t , η ∈ [0, 1], ϵ-prediction diffusion model. Find a sequence of timesteps {v i } N i=0 , where v 0 = 0 and v N = T . Initialize x ∼ N (0, I). for i = N, • • • , 1 do t ← v i , s ← v i-1 ▷ Get start and end times for this iteration α t ← 1 1+σ 2 t ▷ Get α in VP-SDE ϵ θ ← ϵ-prediction(x; t) ▷ Predict the (standardized) noise xt ← x- √ 1-αtϵ θ √ αt . ▷ Predict the one-step denoised result We note that in reconstruction guidance (Ho et al., 2022) , the following approximation is made: c 1 ← η 1 -αt αs 1-αs 1-αt ▷ Get coefficients c 1 , c 2 in DDIM c 2 ← 1 -α s -c 2 1 . if noiseless then g ← (h † (y) -h † (h(x t ))) ⊤ ∂ xt ∂x ⊤ else g ← (y -H xt ) ⊤ HH ⊤ + σ 2 y r 2 t I -1 H ∂ xt ∂xt ⊤ end if ϵ ∼ N (0, I) ▷ Sample i.i.d. Gaussian x ← √ α s xt + c 1 ϵ + c 2 ϵ θ + √ α t g ▷ ΠGDM p (RG) t (y|x t ) ≈ N (H xt , σ 2 t I), where we omit the α t term in (Ho et al., 2022) as we use the Variance Exploding (VE) parametrization throughout the paperfoot_5 . While Ho et al. (2022) only listed super-resolution and inpainting as two examples, the general idea can be extended to any linear H. Transpose, not pseudoinverse. This approximation would lead to the following score function: ∇ xt log p (RG) t (y|x t ) ≈ 1 σ 2 t (H ⊤ y -H ⊤ H xt ) ⊤ ∂ xt ∂x t ⊤ ; ( ) which essentially replaces the pseudoinverse term H † in Eq. 8 with the transpose term H ⊤ (ignoring the differences in the variance approximation). Taking the singular value decomposition over H = U ΣV , we have that: H ⊤ H = (U ΣV ) ⊤ (U ΣV ) = V ⊤ Σ 2 V ( ) H † H = (U Σ -1 V ) ⊤ (U ΣV ) = V ⊤ I(Σ 2 )V . ( ) where I(Σ 2 ) is the diagonal matrix which take 1 if the corresponding entry in Σ 2 is non-zero, and 0 otherwise. Multiplying the former with a vector x (as in reconstruction guidance) will scale the singular vectors by Σ 2 where as multiplying the latter (as in pseudoinverse guidance) keeps the scale for all singular vectors that correspond to nonzero singular values. When the measurement matrix has very different singular values (see Fig. 5 ), reconstruction guidance may improperly rescale the singular vectors, leading to reduced performance compared with pseudoinverse guidance (as in Table 5 ). Consistent approximation of p t (x 0 |x t ). In reconstruction guidance, the isotropic Gaussian approximation over the distribution p t (y|x t ) means that the approximation of the distribution p t (x 0 |x t ) will depend on the measurement model. For example, suppose we have two diagonal measurement matrices D 1 and D 2 with all positive values along the diagonal, and D 1 ̸ = D 2 . If we use D 1 as the measurement, then x 0 = D -1 1 y and p (RG) t (x 0 |x t ) ≈ N (x t , σ 2 t D -2 1 ), but if we use D 2 as the measurement, then x 0 = D -1 2 y and p (RG) t (x 0 |x t ) ≈ N (x t , σ 2 t D -2 2 ), which is different from the earlier approximation. However, conditioned on x t , the distribution of x 0 can be inferred from the diffusion model alone, so it should not depend on the measurement model. Therefore, reconstruction guidance does not make a consistent approximation of the distribution p t (x 0 |x t ); pseudoinverse guidance, on the other hand, approximates p t (x 0 |x t ) directly from the diffusion model and then approximates p t (y|x t ) by marginalization of Gaussians.

A.3 ABOUT THE VARIANCE OF THE APPROXIMATION

Our approximation for p t (x 0 |x t ) depends on the variance term r t , which should depend on the variance of the data distribution. For example, if the data distribution p 0 (x 0 ) = N (0, I) is the standard normal distribution, then from Bayes' rule we have the following closed-form solution for the posterior: p t (x 0 |x t ) ∝ p 0 (x 0 )p t (x t |x 0 ) = N x t σ 2 t + 1 , σ 2 t σ 2 t + 1 I , so in this case, we have r 0 = 0 and r T ≈ 1. In Ho et al. ( 2022), r t is set as σ t ; this is reasonable in when t → 0 (noise level is small), but unrealistic when t → T (noise level is much higher than the data variance). Nevertheless, in the noiseless case, we do not have to make the value of r t explicit, as we can simply rescale the gradient terms using the guidance weights. In the noisy case, the choice of r t matters more as it interacts with σ y in Eq. 7. We simply use r t = σ 2 t σ 2 t + 1 from Eq. 17, which provides good empirical results for our noisy inpainting experiments. To see why this is the case, let us take denoising as an example where H = I. When σ t ≪ σ y is small, then r t ≈ σ t , and r 2 t (r 2 t + σ 2 y ) -1 ≈ σ 2 t σ -2 y becomes small, meaning that the noisy measurement provides little impact to the guidance term. Whereas when σ t is large, then r t ≈ 1, and r 2 t (r 2 t + σ 2 y ) -1 ≈ 1, meaning that the guidance term will be impacted by the noisy measurements. The sampling procedure would first guide the unconditional samples towards the noisy measurements, and then perform denoising without overfitting them.

A.4 ABOUT ADAPTIVE GUIDANCE WEIGHTS

The sampling updates for the original DDIM paper is derived from the VP-SDE, so we rewrite the updates in the form of VE-SDE used in this paper: where ϵ ∼ N (0, I) and f (x t ; s, t, η) = xt + ηc t→s ϵ + σ -1 t σ 2 s -η 2 c 2 t→s (x t -xt ), = x t + σ 2 t ∇ xt log p t (x t ) + ηc t→s ϵ -σ t σ 2 s -η 2 c 2 t→s ∇ xt log p t (x t ) c t→s = (σ 2 t -σ 2 s )σ 2 s σ 2 t , corresponds to the c 1 coefficient in the original DDIM sampler (using σ t (VE) instead of α t (VP) formuation). Note that xt = x t + σ 2 t ∇ xt log p t (x t ) is the denoised result. In Ho et al. (2022) , the guidance is applied to xt for some constant w r , such thatfoot_6 : x(RG) t = x t + σ 2 t ∇ xt log p t (x t ) - w r 2 ∇ xt ∥y -H xt ∥ 2 2 , Using the DDIM sampler, the update for the next sample replaces xt in Eq. 18 with x(RG) t , which further multiples a factor to the guidance term. In our case, we directly add the guidance term to Eq. 19, which is more similar to the approach in Chung et al. (2022b) . This would be equivalent to the weights in Ho et al. (2022) if w t is different for different t, i.e., applying time-dependent guidance weights during sampling (hence being "adaptive"). While it is possible to tune w r to achieve decent results, we found that different tasks may require different w t . For example, w r = 5 works well for super-resolution, but suffers from numerical overflow issues in JPEG restoration (see Fig. 8 ).

A.5 ABOUT THE LOW DYNAMIC RANGE MEASUREMENT FUNCTION

Let h(x) be a function that reduces the dynamic range of an image. Typically, this consists of a dynamic range clipping stage h 1 , a non-linear mapping stage h 2 , and a quantization stage h 3 (Liu et al., 2020) . The non-linear mapping is also known as the camera response function, and it is fair to assume it is invertible (its inverse denoted as h † 2 ). The dynamic range clipping function typically consists of the form h 1 (x) = clip(x, a, b) where a and b are lower and upper clipping ranges; we can define h † 1 as follows: h † 1 (y) =    y if y ∈ [a, b] a if y < a b otherwise . ( ) Therefore, we can define h † (x) = h † 3 (h † 2 (h † 1 (x) )) for pseudoinverse guidance. For ease of illustration, we assume h 2 to be the identity function in our qualitative results, and focus on the clipping and quantization functions; in these cases, we clip images of range [-1, 1] to [-0.6, 0.6], and then quantize 8 bit representations into 4 bits.

A.6 JUSTIFYING OUR APPROXIMATION

As discussed in Sec. 3.1, it is computationally infeasible to sample from more exact representations of p(x 0 |x t ) (i.e., the diffusion model), so we need to approximate it. A straightforward way is to approximate via variational inference: instead of the multi-step diffusion process, we use a simple Gaussian to approximate it. Let us denote the Gaussian as q(x 0 |x t ); we can minimize the KL divergence between q(x 0 |x t ) and p(x 0 |x t ), which gives us the following objective function: min q E p(x0)p(xt|x0) [KL(p(x 0 |x t )||q(x 0 |x t ))] = min q E p(x0,xt) [log p(x 0 , x t ) -log q(x 0 |x t )], If we define q(x 0 |x t ) as Gaussian with a fixed standard deviation and mean as a function µ of x t , then the objective is equivalent to: min µ E p(x0,xt) [∥µ(x t ) -x 0 ∥ 2 2 ]. which is exactly the denoising score matching / denoising autoencoder objective (Vincent, 2011) . Therefore, we can use the single step denoiser result as the mean of q(x 0 |x t ). Nevertheless, one might be interested in how "tight" is our approximation. While it is intractable to compare with ground truth p(y|x t ) or even p(x 0 |x t ), it is not hard to compare the score functions of p(x 0 |x t ) and our approximation q(x 0 |x t )foot_7 . In fact, denoting the denoiser as D, we have that: ∇ xt log p(x 0 |x t ) = ∇ xt log p(x t |x 0 ) -∇ xt log p(x t ) (24) = (x 0 -x t )/σ 2 t -(D(x t ) -x t )/σ 2 t = (x 0 -D(x t ))/σ 2 t , and (overloading "gradient" notation for derivatives) ∇ xt log q(x 0 |x t ) ∝ [∇ xt D(x t )](x 0 -D(x t )). Therefore, the ground truth score is proportional to (x 0 -D(x t )) whereas our score is proportional to [∇ xt D(x t )](x 0 -D(x t )). The two terms are different by a left matrix multiply of the gradient ∇ xt D(x t ). In the literature of plug-and-play methods, a reasonable assumption for the denoiser is that it can be represented as a pseudolinear filter over the input (see Romano et al. (2016) for detailed explanations), so the gradient behaves roughly like a matrix. This suggests that our approximation is reasonably close, at least when the above score functions are concerned.

B EXPERIMENTAL DETAILS B.1 ADDITIONAL EXPERIMENTAL SETUPS FOR QUANTITATIVE RESULTS

4× super-resolution Following CCDF (Chung et al., 2021) and SDEdit (Meng et al., 2021) , we initialize our sampler with a smaller noise level than the maximum one by adding Gaussian noise to the linearly upsampled image (of size 256 × 256), which is chosen to be the one at the 500-th discrete timestep (where the model is trained with a total of 1000 discrete timesteps). Here, we choose 100 iterations and η = 1.0 for ΠGDM. FID is evaluated over the restoration results on the entire ImageNet validation set, and compared against the statistics of the ImageNet training set. The baselines are run as follows. For DDRM (Kawar et al. (2022a) ), we used the default hyperparameter settings. For SR3 (Saharia et al. (2021) ), we reported the official results from the paper. For ADM-U (Dhariwal & Nichol (2021) ), we used their publicly available 64 → 256 ImageNet checkpoint and run 100 iterations for each image with the default command. Inpainting For ΠGDM, we use a class-conditional model, initialize our sampler from pure Gaussian noise at the maximum noise level σ T , apply 100 iterations to each image, and set η = 1.0. Following Saharia et al. (2022a) , we evaluate FID over a 10k subset of the ImageNet validation set, and compare against the statistics of the ImageNet validation set. 

B.2 ADDITIONAL ABLATION STUDIES AND DETAILS

Uniform deblurring We use the uniform 9 × 9 deblurring kernel used in Kawar et al. (2022a) . The problem itself is relatively simple as it has few non-zero singular values, and simply taking the pseudoinverse over the observations already gives good results. For all methods, we use a classunconditional model, initialize our sampler from the 100-th discrete timestep using the CCDF / SDEdit approach. We use a total of 20 iterations for each image and η = 0.5 for the 1000 images used in the evaluation set of DGP / DDRM (Pan et al., 2021; Kawar et al., 2022a) . We compare PSNR metrics with images scaled to [0, 1], and Kernel Inception Distance (KID, (Bińkowski et al., 2018) ) metrics against the 1000 reference images, following the practice in Kawar et al. (2022a) . 4× super-resolution The experiment setup is identical to the quantitative evaluation case, except that we evaluate the metrics over the 10k subset from Saharia et al. (2022a) . We use a total of 100 iterations for each images and η = 1.0 for the 10000 images. Ablation study over η and number of iterations We report additional results over the hyperparameters in the DDIM sampler, which are η (the amount of noise injected at each step) and the number of iterations (steps) per image on super-resolution tasks. We consider the Pool and Bicubic 4× super-resolution task over the entire ImageNet validation set. From the results in Tabs. 6 and 7, we can draw similar conclusions as the ones from the DDIM paper (Song et al., 2021a) : more iterations generally lead to improved performance, whereas the effect of η varies. When the number of iterations is small, smaller η is better (as it injects less noise in the process); when the opposite is true, larger η is better (due to the sampling process being more robust to errors in the score function).

B.3 ADDITIONAL FIGURES

We list additional qualitative results in Figs. 9 to 15. All the results with ΠGDM and reconstruction guidance are generated with 100 steps and η = 1.0. We use w r = 1 for reconstruction guidance. !90'& l''&m l''&m l''&m l''&m l''&m et al., 2015) . '('8'2$' 3>f8'9l?m l?m 3>f8'9l¥?m l¥?m et al., 2015) . (Chung et al., 2022a) and ΠGDM for a 4× super-resolution (Pool) example. level and 0 is lowest noise level). As expected, both methods start with the same loss. However, the loss function of DPS significantly increases around timesteps 1000 and 900, reaching 1000× of the initial loss; this means that the DPS learning rate schedule is too large at this initial stage. In contrast, ΠGDM has a smooth loss curve over the entire process; in fact, the loss curves are quite consistent under 20, 50, or 100 steps. Moreover, the final loss for ΠGDM is also smaller than that of DPS, further illustrating its superiority.



To save space, we use subscript index σt to represent the parentheses index σ(t) for functions of t. We use the numerator layout, so gradient is the transpose of derivative. https://bit.ly/eval-pix2pix https://github.com/openai/guided-diffusion/blob/main/scripts/super_res_train.py. https://github.com/openai/guided-diffusion Nevertheless, VE is equivalent to the variance preserving (VP) parametrization up to a time-dependent scaling factor(Song et al., 2021a), so any sampling algorithm for VE can be adapted to VP. Again, the αt term is missing because we use VE instead of VP. We note that tractable score functions does not imply tractable likelihood, as the latter requires the partition function, which itself requires a marginalization step.



Figure1: High-level illustration of ΠGDM. (Top) Problem-agnostic diffusion models perform an iterative denoising operation to produce random samples. (Bottom) ΠGDM utilizes problem-agnostic diffusion models to solve inverse problems, a key component of which is pseudoinverse guidance (ΠG). ΠG converts the problem-agnostic score function into a problem-specific one, using information about the measurements y and measurement model, denoted as h here (h is JPEG compression + masking in this figure, best viewed zoomed in). The additional guidance term is a vector-Jacobian product (VJP) that encourages consistency between the denoising result and the measurements, after a pseudoinverse transformation h † . ΠGDM applies the denoising process from ΠG in an iterative fashion to generate valid solutions to the inverse problem.

Figure 2: ΠGDM applies a single problem-agnostic diffusion model for various inverse problems, avoiding the cost of training multiple problem-specific ones. Best viewed zoomed in.

Figure 3: Results on JPEG restoration. From left to right: the compressed JPEG image, restoration results from Palette (task-specific) and ΠGDM (task-agnostic), and the reference image.

Figure 4: Results on noisy inpainting problems. Reconstruction guidance (third column) does not handle measurement noise and will keep the noisy measurements, so only the masked regions are denoised.

Figure 5: Singular values of the bicubic downsampling and uniform blurring measurement matrix.

Figure 6: Left: σ t as a function of t. Right: the ratio between our guidance weights and the Video Diffusion Models (VDM) weight w r = 1 (Ho et al., 2022) under different η values. We take 100 uniformly spaced timesteps (out of a possible of 1000 timesteps).

Figure 7: Two examples that compare our adaptive weight schedule with the different weights w r in Video Diffusion Models (VDM, (Ho et al., 2022)) on 4× super-resolution (Bicubic). For fair comparison, ΠG is used for all cases.

Figure 8: Two examples that compare our adaptive weight schedule with the different weights w r in Video Diffusion Models (VDM, (Ho et al., 2022)) on JPEG (QF=10) restoration.

Figure 9: Comparing methods for the Pool 4× super-resolution problem, including reconstruction guidance (Ho et al., 2022), ADM-U (Dhariwal & Nichol, 2021), and ΠGDM. Best viewed zoomed in.

Figure 11: Inpainting results using ΠGDM for the Freeform problem with multiple random samples.

Figure 13: Restoration results with ΠGDM over various composed measurements. The same random seed is used for different problems.

Figure 14: Inpainting results with ΠGDM for the Center problem on the LSUN Bedroom dataset(Yu et al., 2015).

Figure 15: Super-resolution results with ΠGDM for the Pool case on the LSUN Bedroom dataset(Yu et al., 2015).

Figure 16: Super-resolution results with DPS (Chung et al., 2022a) and ΠGDM for the 4× superresolution (Pool) under different number of diffusion steps.

Our solution to this issue is to use reasonable approximations to the true p t (x 0 |x t ), such that the resulting approximation to the score ∇ xt log p t (y|x t ) is easy to compute. Intuitively, instead of representing p(x 0 |x t ) with the entire diffusion model from time t to 0, we use a one-step denoising process. Specifically, we first approximate p t (x 0 |x t ) with the following Gaussian:

Comparison of different guidance methods.

4× super-resolution results. Dark-colored rows indicate methods using problem-specific models.

Inpainting results. Dark-colored rows indicate methods using problem-specific models.

JPEG restoration results. Dark-colored rows indicate methods using problem-specific models.

Ablation studies on pseudoinverse guidance (ΠG) and our adaptive weight schedule (AW).

update, first three terms are simply DDIM.

4× super-resolution results (Pool) from ΠGDM using the class-unconditional model.

4× super-resolution results (Bicubic) from ΠGDM using the class-conditional model.JPEG RestorationFor each quality factor, we use the quantization matrix inEhrlich et al. (2020) to compress the original 256 × 256 image, with 2 × 2 chroma subsampling. The quantization matrix is embedded in every JPEG file, so having this available to the algorithm is a natural and realistic setting. For ΠGDM, we use a class-unconditional model, initialize our sampler from pure Gaussian noise at the maximum timestep, apply 100 iterations to each image, and set η = 1.0. The FID is evaluated as in the inpainting case, followingSaharia et al. (2022a).

Performance of DPS under different learning rates and 1000 diffusion steps.

annex

-$<#-$ '$329;8<$;-32+<-&!2$' f '('8'2$'Figure 10 : Comparing methods for the Bicubic 4× super-resolution problem, including reconstruction guidance (Ho et al., 2022) , ADM-U (Dhariwal & Nichol, 2021) , and ΠGDM. Best viewed zoomed in.Table 8 : NFEs of various algorithms. We list common baselines, such as Palette (Saharia et al., 2022a) , ADM-U (Dhariwal & Nichol, 2021) , SR3 (Saharia et al., 2021) , DGP (Pan et al., 2021) , SNIPS (Kawar et al., 2021) , RED (Romano et al., 2016) , DDRM (Kawar et al., 2022a) , and DPS (Chung et al., 2022a) . Since the algorithm requires iterations with the diffusion model, the actual runtime of the algorithm would depend heavily on the number of Neural Function Evaluations (NFEs). Kawar et al. Kawar et al. (2022a) found that the runtime for diffusion models would dominate the total runtime, as other computations (such as matrix operations on images) are negligible. Therefore, we use NFE as the unit for estimating the runtime of different algorithms. In Tab. 8, we report the NFEs used by each algorithm.We further note that ΠGDM and DPS take additional backpropagation steps through the neural network, so each NFE is roughly 3× as expensive as others. Thus, ΠGDM is only beaten in terms of actual wall-clock time by regression and DDRM, and we have shown that it has superior restoration results than both in Tabs. 2 to 4.

B.5 COMPARISON WITH VARIOUS BASELINES

In Tab. 9, we compare ΠGDM against various baselines, including ones that are based on Plug We find that ΠGDM achieves the best performance when compared with other PnP baselines; this is reasonable given that DDRM (Kawar et al., 2022a) ), we find that while DPS has strong performance with 1000 diffusion steps, its performance becomes much worse with less diffusion steps. Moreover, even with the full 1000 steps, one would still need to tune for the learning rate hyperparameter to get good performance. In contrast, our method is 10× faster than DPS in the slowest case, and achieves decent performance even with fewer diffusion steps (such as 20 and 50 steps). In these settings, DPS does not produce reasonable results at all.

Image restoration quality.

In our experiments, we evaluate performance of the models averaged over 5 validation images on ImageNet. For DPS, we consider different numbers of diffusion steps (from 20 to 1000), with the default learning rate being the one chosen in the DPS paper for 1000 steps. For ΠGDM, we use the same settings as discussed in the paper; we report for diffusion steps up to 100 steps, except for deblurring (where the task is simple enough to get good results in 20 steps).We report LPIPS and SSIM metrics for Pool, Bicubic, and 9 × 9 uniform deblurring in Tabs. 10 to 12. From the tables, it is obvious that DPS performance drops rapidly once number of diffusion steps decreases under 1000, whereas ΠGDM performance remain competitive. We illustrate this trend visually in Fig. 16 for 4× super-resolution (Pool). In Tab. 13, we report the results for DPS under different learning rate hyperparameters; we found that the optimal hyperparameter can be problem-dependent: the optimal one for super-resolution is around 1 and 2, whereas using that for deblurring will produce NaNs; the optimal one for deblurring is 0.2, where super-resolution results tend to become less optimal.Loss curves. We can treat the guidance terms in both DPS and ΠGDM as a gradient step that optimizes the least squares loss function ∥y -Hx 0 ∥ 2 2 , so it is natural to visualize the loss at each diffusion noise level. In Fig. 17 , we visualize the loss curve of DPS and ΠGDM on a 4× superresolution (Pool) example as a function of the diffusion timestep level (so 1000 is highest noise 

