REMOVING STRUCTURED NOISE WITH DIFFUSION MODELS

Abstract

Solving ill-posed inverse problems requires careful formulation of prior beliefs over the signals of interest and an accurate description of their manifestation into noisy measurements. Handcrafted signal priors based on e.g. sparsity are increasingly replaced by data-driven deep generative models, and several groups have recently shown that state-of-the-art score-based diffusion models yield particularly strong performance and flexibility. In this paper, we show that the powerful paradigm of posterior sampling with diffusion models can be extended to include rich, structured, noise models. To that end, we propose a joint conditional reverse diffusion process with learned scores for the noise and signal-generating distribution. We demonstrate strong performance gains across various inverse problems with structured noise, outperforming competitive baselines that use normalizing flows and adversarial networks. This opens up new opportunities and relevant practical applications of diffusion modeling for inverse problems in the context of non-Gaussian measurements.

1. INTRODUCTION

Many signal and image processing problems, such as denoising, compressed sensing, or phase retrieval, can be formulated as inverse problems that aim to recover unknown signals from (noisy) observations. These ill-posed problems are, by definition, subject to many solutions under the given measurement model. Therefore, prior knowledge is required for a meaningful and physically plausible recovery of the original signal. Bayesian inference and maximum a posteriori (MAP) solutions incorporate both signal priors and observation likelihood models. Choosing an appropriate statistical prior is not trivial and is often dependent on both the application as well as the recovery task. Before deep learning, sparsity in some transform domain has been the go-to prior in compressed sensing (CS) methods (Eldar & Kutyniok, 2012), such as iterative thresholding (Beck & Teboulle, 2009) or wavelet decomposition (Mallat, 1999) . At present, deep generative modeling has established itself as a strong mechanism for learning such priors for inverse problem-solving. Both generative adversarial networks (GANs) (Bora et al., 2017) and normalizing flows (NFs) (Asim et al., 2020; Wei et al., 2022) have been applied as natural signal priors for inverse problems in image recovery. These data-driven methods are more powerful compared to classical methods, as they can accurately learn the natural signal manifold and do not rely on assumptions such as signal sparsity or hand-crafted basis functions. Recently, diffusion models have shown impressive results for both conditional and unconditional image generation and can be easily fitted to a target data distribution using score matching (Vincent, 2011; Song et al., 2020) . These deep generative models learn the score of the data manifold and produce samples by reverting a diffusion process, guiding noise samples towards the target distribution. Diffusion models have achieved state-of-the-art performance in many downstream tasks and applications, ranging from state-of-the-art text-to-image models such as DALL-E 2 (Ramesh et al., 2022) to medical imaging (Song et al., 2021b; Jalal et al., 2021a; Chung & Ye, 2022) . Furthermore, understanding of diffusion models is rapidly improving and progress in the field is extremely fast-paced (Chung et al., 2022a; Bansal et al., 2022; Daras et al., 2022a; Karras et al., 2022; Luo, 2022) . The iterative nature of the sampling procedure used by diffusion models renders inference slow compared to GANs and VAEs. However, many recent efforts have shown ways to significantly improve the sampling speed by accelerating the diffusion process. Inspired by momentum methods in sampling, Daras et al. (2022b) introduces a momentum sampler for diffusion models, which leads to increased sample quality with fewer function evaluations. Chung et al. (2022b) offers a new sampling strategy, namely Come-Closer-Diffuse-Faster (CCDF), which leverages the conditional quality of inverse problems. The reverse diffusion can be initialized from the observation instead of a sample from the base distribution, which leads to faster convergence for conditional sampling. Salimans & Ho (2021) proposes a progressive distillation method that augments the training of the diffusion models with a student-teacher model setup. In doing this, they were able to drastically reduce the number of sampling steps. Lastly, many methods aim to execute the diffusion process in a reduced space to accelerate the diffusion process. While Jing et al. Despite this promise, current score-based diffusion methods for inverse problems are limited to measurement models with unstructured noise. In many image processing tasks, corruptions are however highly structured and spatially correlated. Relevant examples include interference, speckle, or haze. Nevertheless, current conditional diffusion models naively assume that the noise follows some basic tractable distribution (e.g. Gaussian or Poisson). Beyond the realm of diffusion models, Whang et al. (2021) extended normalizing flow (NF)-based inference to structured noise applications. However, compared to diffusion models, NFs require specialized network architectures, which are computationally and memory expensive. Given the promising outlook of diffusion models, we propose to learn score models for both the noise and the desired signal and perform joint inference of both quantities, coupled via the observation model. The resulting sampling scheme enables solving a wide variety of inverse problems with structured noise. The main contributions of this work are as follows: • We propose a novel joint conditional posterior sampling method to efficiently remove structured noise using diffusion models. Our formulation is compatible with many existing iterative sampling methods for score-based generative models. • We show strong performance gains across various challenging inverse problems involving structured noise compared to competitive state-of-the-art methods based on NFs and GANs. • We demonstrate improved robustness on out-of-distribution signals compared to baselines.

2. PROBLEM STATEMENT

Many image reconstruction tasks can be formulated as an inverse problem with the basic form y = Ax + n, where y ∈ R m is the noisy observation, x ∈ R d the desired signal or image, and n ∈ R m the additive noise. The linear forward operator A ∈ R m×d captures the deterministic transformation of x. Maximum a posteriori (MAP) inference is typically used to find an optimal solution xMAP that maximizes posterior density p X|Y (x|y): xMAP = arg max x log p X|Y (x|y) = arg max x log p Y |X (y|x) + log p X (x) , where p Y |X (y|x) is the likelihood according to the measurement model and log p X (x) the signal prior. Assumptions on the stochastic corruption process n are of key importance too, in particular for applications for which this process is highly structured. However, most methods assume i.i.d. Gaussian distributed noise, such that the forward model becomes p Y |X (y|x) ∼ N (Ax, σ 2 N I). This naturally leads to the following simplified problem: xMAP = arg min x 1 2σ 2 N ||y -Ax|| 2 2 -log p X (x). However, as mentioned, this naive assumption can be very restrictive as many noise processes are much more structured and complex. A myriad of problems can be addressed under the formulation of equation 1, given the freedom of choice for the noise source n. Therefore, in this work, our aim is to solve a more broad class of inverse problems defined by any arbitrary noise distribution n ∼ p N (n) ̸ = N and signal prior x ∼ p X (x), resulting in the following, more general, MAP estimator proposed by Whang et al. (2021) : xMAP = arg max x log p N (y -Ax) -log p X (x). In this paper, we propose to solve this class of problems using flexible diffusion models. Furthermore, diffusion models naturally enable posterior sampling, allowing us to take advantage of the benefits thereof (Jalal et  (z) into a more complex multimodal distribution x = G(z) ∼ p X (x). The invertible nature of the mapping G allows for exact density evaluation through the change of variables formula: log p X (x) = log p Z (z) + log | det J G -1 (x)|, ( ) where J is the Jacobian that accounts for the change in volume between densities. Since exact likelihood computation is possible through the flow direction G -1 , the parameters of the generator network can be optimized to maximize likelihood of the training data. Subsequently, the inverse task is solved using the MAP estimation in equation 5: x = arg max x {log p G N (y -Ax) + log p G X (x)} , where G N and G X are generative flow models for the noise and data respectively. Analog to that, the solution can be solved in the latent space rather than the image space as follows: ẑ = arg max z {log p G N (y -A(G X (z))) + λ log p G X (G X (z))} . Note that in equation 8 a smoothing parameter λ is added to weigh the prior and likelihood terms, as was also done in Whang et al. (2021).

2.1.2. GENERATIVE ADVERSARIAL NETWORKS

Generative adversarial networks (GANs) are implicit generative models that can learn the data manifold in an adversarial manner (Goodfellow et al., 2020) . The generative model is trained with an auxiliary discriminator network that evaluates the generator's performance in a minimax game. The generator G(z) : R l → R d maps latent vectors z ∈ R l ∼ N (0, I) to the data distribution of interest. The structure of the generative model can also be used in inverse problem solving (Bora et al., 2017) . The objective can be derived from equation 3 and is given by: ẑ = arg min z ||y -AG X (z)|| + λ||z|| 2 2 , where λ weights the importance of the prior with the measurement error. The ℓ 2 regularization term on the latent variable is proportional to negative log-likelihood under the prior defined by G X , where the subscript denotes the density that the generator is approximating. While this method does not explicitly model the noise, it remains an interesting comparison, as the generator cannot reproduce the noise found in the measurement and can only recover signals that are in the range of the generator. Therefore, due to the limited support of the learned distribution, GANs can inherently remove structured noise. However, the representation error (i.e. observation lies far from the range of the generator (Bora et al., 2017) ) imposed by the structured noise comes at the cost of recovery quality.

2.2. BACKGROUND ON SCORE-BASED DIFFUSION MODELS

One class of deep generative models is known as diffusion models. These generative models have been introduced independently as score-based models (Song & Ermon, 2019; 2020) and denoising diffusion probabilistic modeling (DDPM) (Ho et al., 2020) . In this work, we will consider the formulation introduced in Song et al. (2020), which unifies both perspectives on diffusion models by expressing diffusion as a continuous process through stochastic differential equations (SDE). Diffusion models produce samples by reversing a corruption process. In essence these models are networks trained to denoise its input. Through iteration of this process, samples can be drawn from a learned data distribution, starting from random noise. The diffusion process of the data x t ∈ R d t∈[0,1] is characterized by a continuous sequence of Gaussian perturbations of increasing magnitude indexed by time t ∈ [0, 1]. Starting from the data distribution at t = 0, clean images are defined by x 0 ∼ p(x 0 ) ≡ p(x). Forward diffusion can be described using an SDE as follows: dx t = f (t)x t dt + g(t)dw, where w ∈ R d is a standard Wiener process, f (t) : [0, 1] → R and g(t) : [0, 1] → R are the drift and diffusion coefficients, respectively. Moreover, these coefficients are chosen so that the resulting distribution p 1 (x) at the end of the perturbation process approximates a predefined base distribution p 1 (x) ≈ π(x). Furthermore, the transition kernel of the diffusion process is defined as q(x t |x) ∼ N (x t |α(t)x, β 2 (t)I), where α(t) and β(t) can be analytically derived from the SDE. Naturally, we are interested in reversing the diffusion process, so that we can sample from x 0 ∼ p 0 (x 0 ). The reverse diffusion process is also a diffusion process given by the reverse-time SDE (Anderson, 1982; Song et al., 2020) : dx t = f (t)x t -g(t) 2 ∇ xt log p(x t ) score dt + g(t)d wt ( ) where wt is the standard Wiener process in the reverse direction. The gradient of the log-likelihood of the data with respect to itself, a.k.a. the score function, arises from the reverse-time SDE. The score function is a gradient field pointing back to the data manifold and can intuitively be used to guide a random sample from the base distribution π(x) to the desired data distribution. Given a dataset X = x (1) , x (2) , . . . , x (|X |) ∼ p(x), scores can be estimated by training a neural network s θ (x t , t) parameterized by weights θ, with score-matching techniques such as the denoising score matching (DSM) objective (Vincent, 2011) :  θ * = arg min θ E t∼U [0,1] E (x,xt)∼p(x)q(xt|x) ∥s θ (x t , t) -∇ xt log q(x t |x)∥ 2 2 . ( )

3.1. CONDITIONAL POSTERIOR SAMPLING UNDER STRUCTURED NOISE

We are interested in posterior sampling under structured noise. We recast this as a joint optimization problem with respect to the signal x and noise n given by: (x, n) ∼ p X,N (x, n|y) ∝ p Y |X,N (y|x, n) • p X (x) • p N (n). Solving inverse problems using diffusion models requires conditioning of the diffusion process on the observation y, such that we can sample from the posterior p X|Y (x, n|y). Therefore, we construct a joint conditional diffusion process {x t , n t |y} t∈[0,1] , in turn producing a joint conditional reversetime SDE:  d(x t , n t ) = f (t)(x t , n t ) -g(t) which allows us to perform posterior sampling for both the signal, such that x ≡ x 0 ∼ p X|Y (x 0 |y), as well as the structured noise, such that n ≡ n 0 ∼ p N |Y (n 0 |y). To solve the approximated joint conditional reverse-time SDE, we resort to the aforementioned iterative scheme in Section 2.2, however, now incorporating the observation via a data-consistency step. This is done by taking gradient steps that minimize the ℓ 2 norm between the true observation and its model prediction given current estimates of x and n. Ultimately, this results in solutions that are consistent with the observation y and have high likelihood under both prior models. The data-consistency update steps for both x and n are derived as follows: xt-∆t = xt -∇ xt log p(ŷ t |x t , nt ) = xt -∇ xt ||ŷ t -(Ax t + nt )|| 2 2 = xt -λA T (Ax t -ŷt + nt ), (18) nt-∆t = nt -∇ nt log p(ŷ t |x t , nt ) = nt -∇ nt ||ŷ t -(Ax t + nt )|| 2 2 = nt -µ(Ax t -ŷt + nt ), where the time difference between two steps ∆t = 1/T and λ and µ are weighting coefficients for the signal and noise gradient steps, respectively. An example of the complete sampling algorithm is shown in Algorithm 1, which adapts the Euler-Maruyama sampler (Song et al., 2020) to jointly find the optimal data sample and the optimal noise sample while taking into account the measurement model in line 7 and 8 using the outcome of equation 18 and equation 19, respectively. Although we show the Euler-Maruyama method, our addition is applicable to a large family of iterative sampling methods for score-based generative models. Algorithm 1: Joint conditional posterior sampling with Euler-Maruyama method Require: T, s θ , s ϕ , λ, µ, y  1 x1 ∼ π(x), n1 ∼ π(n), ∆t ← 1 T 2 3 for i = T -1 to 0 do 4 t ← i+1 T 5 ŷt ∼ p 0t (

3.2. TRAINING AND INFERENCE SETUP

For training the score models, we use the NCSNv2 architecture as introduced in Song & Ermon (2020) in combination with the Adam optimizer and a learning rate of 5e-4 until convergence. For simplicity, no exponential moving average (EMA) filter on the network weights is applied. Given two separate datasets, one for the data and one for the structured noise, two separate score models can be trained independently. This allows for easy adaptation of our method, since many existing trained score models can be reused. Only during inference, the two priors are combined through the proposed sampling procedure as described in Algorithm 1, using the adapted Euler-Maruyama sampler. We use the variance preserving (VP) SDE (β 0 = 0.1, β 1 = 7.0) (Song et al., 2020) to define the diffusion trajectory. During each experiment, we run the sampler for T = 600 iterations.

4. EXPERIMENTS

All models are trained on the CelebA dataset (Liu et al., 2015) and the MNIST dataset with 10000 and 27000 training samples, respectively. We downsize the images to 64 × 64 pixels. Due to computational constraints, we test on a randomly selected subset of 100 images. We use both the peak signal-to noise ratio (PSNR) and structural similarity index (SSIM) to evaluate our results. Automatic hyperparameter tuning for optimal inference was performed for all baseline methods on a small validation set of only 5 images. For both GAN and flow-based methods, we anneal the step size during inference based on stagnation of the objective.

4.2.1. REMOVING MNIST DIGITS

For comparison with Whang et al. (2021), we recreate an experiment introduced in their work, where MNIST digits are added to CelebA faces. Moreover, the experiment is easily reproducible as both CelebA and MNIST datasets are publicly available. The corruption process is defined by y = 0.5 • x CelebA + 0.5 • n MNIST . In this experiment, the score network s ϕ is trained on the MNIST dataset. Fig. 1a shows a quantitative comparison of our method with all baselines. Furthermore, a random selection of test samples is shown in Fig. 2 for qualitative analysis. Both our method and the flow-based method are able to recover the data, and remove most of the structured noise. However, more details are preserved using the diffusion method. In contrast, the flow-based method cannot completely remove the digits in some cases and is unable to reconstruct some subtle features present in the original images. Furthermore, we observe that for the flow-based method, initialization from the measurement is necessary to reproduce the results in Whang et al. ( 2021) since random initialization does not converge. The GAN method is also able to remove the digits, but cannot accurately reconstruct the faces as it is unable to project the observation onto the range of the generator. Similarly, the BM3D denoiser fails to recover the underlying signal, confirming the importance of prior knowledge of the noise in this experiment. The metrics in Fig. 1a support these observations. See Table 1 for the extended results. Additionally, we expose the methods in a similar experiment to out-of-distribution (OoD) data. The images from this dataset not found in the CelebA dataset, which is the data used for training the models. In fact, the out-of-distribution data is generated using the stable-diffusion text-to-image model Rombach et al. (2022) . We use the exact same hyperparameters as during the experiment on the CelebA dataset. Quantitative and qualitative results are shown in Fig. 1b and Fig. 2020), the flow-based method is robust to OoD data, due to their inherent invertibility. We empirically show that the diffusion method is also resistant to OoD data in inverse tasks with complex noise structures and even outperforms the flowbased methos. Unsurprisingly, the GAN method performs even more poorly when subjected to OoD data. More experiments, covering different inverse problem settings can be found in Appendix A. 

A ADDITIONAL EXPERIMENTS

The following section explores additional inverse problems with compressed sensing and structured noise. The goal is to show the performance of the proposed method in a variety of settings.

A.1 STRUCTURED NOISE WITH COMPRESSED SENSING

The corruption process is defined by y = Ax + n sine with a random Gaussian measurement matrix A ∈ R m×d and a noise with sinusoidal variance σ k ∝ exp sin 2πk 16 for each pixel k. The subsampling factor is defined by the size of the measurement matrix d/m. In this experiment, the score network s ϕ is trained on a dataset generated with sinusoidal noise samples n sine . In Fig. 5 the results of the compressed sensing experiment and the comparison with the baselines are shown for an average standard deviation of σ N = 0.2 and subsampling of factor d/m = 2. Given the same hyperparameter settings, we repeat the experiment on the out-of-distribution (OoD) dataset, shown in Fig. 6 . Similar to the results found in Section 4.2.1, the diffusion method is more robust to the shift in distribution and is able to deliver high quality recovery under the structured noise setting. In contrast, the flow-based method under-performs when subjected to the OoD data. Quantitative results on both CelebA and OoD are found in Fig. 4 as well as Table 1 and Table 2 , respectively.

A.2 REMOVING SINUSOIDAL NOISE

The corruption process is defined by y = x + n sine where the noise variance σ k ∝ exp sin 2πk 16 follows a sinusoidal pattern along each row of the image k. In this experiment, the score network s ϕ is trained on a dataset generated with 1D sinusoidal noise samples n sine . See Fig. 8 for a comparison of our method to the flow-based method for varying noise variances. Both methods perform quite well, with the diffusion method having a slight edge. A visual comparison in Fig. 7 , however, reveals that the diffusion method preserves more detail in general. 



(2022) restricts diffusion through projections onto subspaces, Vahdat et al. (2021) and Rombach et al. (2022) run the diffusion in the latent space.

Figure 1: Quantitative results using PSNR (green) and SSIM (blue) for the removing MNIST digits experiment on 64 × 64 images of the (a) CelebA and (b) out-of-distribution datasets.



, respectively. Similarly to the findings of Whang et al. (2021); Asim et al. (

Figure 2: Results for our diffusion-based method compared to the baselines; FLOW (Whang et al., 2021), GAN (Bora et al., 2017), and BM3D (Dabov et al., 2006) on the removing MNIST digits experiment on 64 × 64 images of the CelebA dataset.

Figure 5: Comparison of results from our diffusion method compared to the baselines on the compressed sensing with sinusoidal noise experiment with d/m = 2, σ N = 0.2 on 64 × 64 images of the CelebA dataset.

R d → R d to transform samples from a base distribution p Z

After training the time-dependent score model s θ , it can be used to calculate the reverse-time diffusion process and solve the trajectory using numerical samplers such as the Euler-Maruyama algorithm. Alternatively, more sophisticated samplers, such as ALD (Song & Ermon, 2019), probability flow ODE (Song et al., 2020), and Predictor-Corrector sampler (Song et al., 2020), can be used to further improve sample quality. These iterative sampling algorithms discretize the continuous time SDE into a sequence of time steps {0 = t 0 , t 1 , . . . , t T = 1}, where a noisy sample xti is denoised to produce a sample for the next time step xti-1 . The resulting samples {x ti } T i=0 constitute an approximation of the actual diffusion process {x t } t∈[0,1] .

2 ∇ xt,nt log p(x t , n t |y) dt + g(t)d wt .(14)    We would like to factorize the posterior using our learned unconditional score model and tractable measurement model, given the joint formulation. Consequently, we construct two separate diffusion processes, defined by separate score models but entangled through the measurement model p Y |X,N (y|x, n). In addition to the original score model s θ (x, t), we introduce a second score model s ϕ (n t , t) ≃ ∇ nt log p N (n t ), parameterized by weights ϕ, to model the expressive noise component n. These two score networks can be trained independently on datasets for x and n, respectively, using the objective in equation 12. The gradients of the posterior with respect to x and n are now given by:

y t |y)

Results for the experiments and different methods on the CelebA dataset. ⋆ Ours, † Whang et al. (2021), ‡ Bora et al. (2017), § Dabov et al. (2006), ¶ Tibshirani (1996).

Results for the experiments and different methods on the out-of-distribution (OoD) dataset. ⋆ Ours, † Whang et al. (2021), ‡ Bora et al. (2017), § Dabov et al. (2006), ¶ Tibshirani (1996). DIFFUSION 22.87 ± 4.581 0.842 ± 0.110 22.90 ± 1.568 0.823 ± 0.082 † FLOW 19.98 ± 1.946 0.824 ± 0.081 19.85 ± 4.840 0.608 ± 0.176 ‡ GAN 13.06 ± 1.788 0.218 ± 0.088 12.39 ± 1.693 0.159 ± 0.070

Inference performance benchmark for all methods. ⋆ Ours, † Whang et al. (2021), ‡ Bora et al. (2017), § Dabov et al. (2006), ¶ Tibshirani (1996). Number of trainable parameters Inference time / image [ms]

4.2.2. PERFORMANCE

To highlight the difference in inference time between our method and the baselines, benchmarks are performed on a single 12GBytes NVIDIA GeForce RTX 3080 Ti, see Table 3 in Appendix B.2. Although this is not an extensive benchmark, a quick comparison of inference times reveals a 50× difference in speed between ours and the flow-based method. All the deep generative models need approximately an equal amount of iterations (T ≈ 600) to converge. However, for the same modeling capacity, the flow model requires a substantial higher amount of trainable parameters compared to the diffusion method. This is mainly due to the restrictive requirements imposed on the architecture to ensure tractable likelihood computation. It should be noted that no improvements to speed up the diffusion process, such as CCDF (Chung et al., 2022b) are applied for the diffusion method, giving room for even more improvement in future work.

5. DISCUSSION AND CONCLUSIONS

In this work, we present a framework for removing structured noise using diffusion models. Our work provides an efficient addition to existing score-based conditional sampling methods incorporating knowledge of the noise distribution. We demonstrate our method on natural and out-ofdistribution data and achieve increased performance over the state-of-the-art and established conventional methods for complex inverse tasks. Additionally, the diffusion based method is substantially easier to train using the score matching objective compared to other deep generative methods and furthermore allows for posterior sampling.While our method is considerably faster and better in removing structured noise compared to the flow-based method (Whang et al., 2021), it is not ready (yet) for real-time inference and still slow compared to GANs (Bora et al., 2017 ) and classical methods. Luckily, research into accelerating the diffusion process are well on their way. In addition, although a simple sampling algorithm was adapted in this work, many more sampling algorithms for score-based diffusion models exist, each of which introduces a new set of hyperparameters. For example, the predictor-corrector (PC) sampler has been shown to improve sample quality (Song et al., 2020) . Future work should explore this wide increase in design space to understand limitations and possibilities of more sophisticated sampling schemes in combination with the proposed joint diffusion method. Furthermore, the range of problems to which we can apply the proposed method, can be expanded into non-linear likelihood models and extend beyond the additive noise models.Lastly, the connection between diffusion models and continuous normalizing flows through the neural ODE formulation (Song et al., 2021a) is not investigated, but greatly of interest given the comparison with the flow-based method in this work.

6. REPRODUCIBILITY STATEMENT

All code used to train and evaluate the models as presented in this paper can be found at https: //anonymous.4open.science/r/iclr2023-joint-diffusion. Essentially, the codebase in https://github.com/yang-song/score_sde_pytorch of Song et al. ( 2020) is used to train the score-based diffusion networks, for both data and structured noise, independently. To implement the proposed inference scheme, the lines in Algorithm 1 should be adapted to create a sampler that includes both trained diffusion models. Details regarding the training and inference settings used to reproduce the results in this work can be found in Section 3.2.

C PSEUDO-CODE

In this section, we provide pseudo-code for the proposed joint conditional diffusion sampler with the Euler-Maruyama sampling algorithm as basis. Furthermore, we use the SDE formulation for the diffusion process which is denoted as an sde object with drift, diffusion and marginal_prob methods. The latter computes the mean and standard deviation of the diffusion transition kernel at a certain time t. Lastly, there are two trained score networks (NCSNv2) score_data and score_noise for the data and structured noise respectively. 

