DDM 2 : SELF-SUPERVISED DIFFUSION MRI DENOIS-ING WITH GENERATIVE DIFFUSION MODELS

Abstract

Magnetic resonance imaging (MRI) is a common and life-saving medical imaging technique. However, acquiring high signal-to-noise ratio MRI scans requires long scan times, resulting in increased costs and patient discomfort, and decreased throughput. Thus, there is great interest in denoising MRI scans, especially for the subtype of diffusion MRI scans that are severely SNR-limited. While most prior MRI denoising methods are supervised in nature, acquiring supervised training datasets for the multitude of anatomies, MRI scanners, and scan parameters proves impractical. Here, we propose Denoising Diffusion Models for Denoising Diffusion MRI (DDM 2 ), a self-supervised denoising method for MRI denoising using diffusion denoising generative models. Our three-stage framework integrates statisticbased denoising theory into diffusion models and performs denoising through conditional generation. During inference, we represent input noisy measurements as a sample from an intermediate posterior distribution within the diffusion Markov chain. We conduct experiments on 4 real-world in-vivo diffusion MRI datasets and show that our DDM 2 demonstrates superior denoising performances ascertained with clinically-relevant visual qualitative and quantitative metrics. Our source codes are available at:

1. INTRODUCTION

Magnetic resonance imaging (MRI) is a non-invasive clinical imaging modality that can provide life-saving diagnostic information. Diffusion MRI is a subtype of MRI commonly used in oncologic and neurologic disorders (Bihan, 2003; Bihan et al., 2006) , which can quantitatively assess microstructural anatomical details. However, diffusion MRI scans suffer from severe signal to noise ratio (SNR) deficits, hindering diagnostic and quantitative accuracy. Image SNR can be improved either by lowering image resolution, which further reduces diagnostic utility, or by increasing the total scan time, which already can require 10+ minutes of time in the MRI scanner. Thus, there is large interest in decreasing diffusion MRI scan times to improve patient throughout in hospitals and the patient experience. While image SNR is governed by the underlying MRI physics, applying postprocessing denoising techniques to fast and low-SNR MRI acquisitions can improve overall image SNR. Developing such methods to improve SNR of diffusion MRI scans is an unsolved problem that may improve the efficacy of the millions of such scans performed in routine clinical practice annually. Supervised machine learning techniques have previously been proposed for MRI denoising, however, such methods are limited by their clinical feasibility. It is clinically impractical to acquire paired highand low-SNR diffusion MRI scans across various anatomies (e.g., brain, abdomen, etc), diffusion weighting factors (and resultant SNR variations), MRI field strength and vendors, and clinical usecases. Such large distributional shifts across heterogeneous use-cases leads to fundamental drop in model performance (Darestani et al., 2021) . The diversity of data and the need for effective denoising methods motivates the use of unsupervised denoising techniques for diffusion MRI. To address these challenges, the major contributions of our work are three-fold: (i) We propose DDM 2 for unsupervised denoising of diffusion MRI scans using diffusion denoising models. Our threestage self-supervised approach couples statistical self-denoising techniques into the diffusion models (shown in Figure 1 Left); (ii) DDM 2 allows representing noisy inputs as samples from an intermediate state in the diffusion Markov chain to generate fine-grained denoised images without requiring ground truth references; (iii) We evaluate our method on four real-world diffusion MRI datasets that encompass the latest and longer established MRI acquisition methods. DDM 2 demonstrates state-of-the-art denoising performance that outperforms second best runner-up by an average of 3.2 SNR and 3.1 contrast-to-noise ratio (CNR) points, across this large range of MRI distributions.

2. RELATED WORK AND BACKGROUND

2.1 DIFFUSION MRI Diffusion MRI sensitizes the MR signal to the movement of protons within an object being imaged using tailored imaging gradients (small steerable magnetic fields). Given that both magnitude (overall duration that gradients are turned on, termed as b-value) and directions (linear combination of the three independent gradient directions) can be steered, the diffusion weighting is considered a vector. A typical diffusion MRI scan for resolving tissue microstructure is 4D in nature, with 3 dimensions of spatial coordinates and 1 dimension of diffusion vectors. Depending on the clinical use-case, the number of diffusion vectors can range from 2 to upwards of 100. Prior approaches for improving diffusion MRI sequences exist; however, these require highly-task-specific models (Kaye et al., 2020; Gibbons et al., 2018) . These methods scan the same volume multiple times, average these multiple low-SNR images to generate a high-SNR image, and subsequently, train a supervised model to transform a single low-SNR image to the averaged high-SNR image. Such methods cannot generalizable to diverse anatomies, MRI scanners, and image parameters of diffusion MRI scans.

2.2. STATISTIC-BASED IMAGE DENOISING

Given a noisy measurement x, an inverse problem is to recover the clean image y defined by: x = λ 1 y + ϵ, (1) where λ 1 is a linear coefficient and ϵ denotes the additive noise. Without losing generality, one usually represent ϵ = λ 2 z as a sample from the Gaussian distribution, N (0, λ 2 2 I). When paired ground truth images y are available, a neural network can then be trained to approximate y through direct supervisions. However, supervised training proves infeasible when y are missing in the datasets. Attempts have been made to relax the reliance on supervised signal from clean images, by using noisy images alone. Based on the assumption that the additive noise ϵ is pixel-wise independent, Noise2Noise (Lehtinen et al., 2018) claimed that to denoise an noisy measurement x towards another measurement x ′ is statistically equivalent to the supervised training on y up to a constant: L(Φ(x ′ ), x) = ||Φ(x ′ ) -x|| 2 ≈ ||Φ(x ′ ) -y|| 2 + const. (2) Based on the above idea, Noise2Self (Batson & Royer, 2019) further designed the J -Invariance theory that achieves self-supervised denoising by using the input x itself. With the statistical independence characteristic of noise, Noise2Void (Krull et al., 2019) and Laine et al. (Laine et al., 2019) extended the self-supervised strategy to a so-called 'blind-spot' technique that masked out image patches and designed a particular neural networks to predict the unmasked pixels. Neighbor2Neighbor (Huang et al., 2021) creates noisy image pairs by dividing sub-samples and uses them to supervise each other. Without losing pixel information, Recorrupted2Recorrupted (Pang et al., 2021) was proposed to keep all pixels intact and achieve self-supervised denoising by deriving two additional corrupted signals. To better utilize 4D MRI characteristics, Patch2Self (Fadnavis et al., 2020) was proposed to utilize multiple diffusion vector volumes as input {x ′ } and learn volume-wise denoising. Building on the statistical theory studied above, in this work we achieve unsupervised MRI denoising through a generative approach. Compared to Patch2Self that requires a large number of volumes (e.g. > 60) as a guarantee to denoise a single volume, DDM 2 can efficiently denoise MRI scans acquired with very few diffusion directions and limited number of volumes (e.g. < 5). This is clinically relevant since common clinical diffusion MRI scans use fewer than 10 directions. Moreover, unlike Patch2Self, our trained denoiser is universally applicable for the entire 4D sequence and no repeated training is needed for denoising volumes in different directions.

2.3. DIFFUSION GENERATIVE MODEL

A diffusion model (Sohl-Dickstein et al., 2015) denotes a parameterized Markov chain with T discretized states S 1,••• ,T that can be trained to generate samples to fit a given data distribution. Transition of this chain is bi-directional and is controlled by a pre-defined noise schedule β 1,••• ,T .

availability

https://github.com/StanfordMIMI

