DIFFUSION ADVERSARIAL REPRESENTATION LEARN-ING FOR SELF-SUPERVISED VESSEL SEGMENTATION

Abstract

Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of groundtruth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it to vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns the background signal using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks in a single step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised vessel segmentation methods.

1. INTRODUCTION

In the clinical diagnosis of vascular diseases, vessel segmentation is necessary to analyze the vessel structures and therapy planning. In particular, when diagnosing coronary artery disease, X-ray angiography is taken to enhance vessel visualization by injecting a contrast agent into the blood vessels (Cong et al., 2015) . However, it is challenging to extract vessels accurately due to low contrast, motion artifacts, many tiny branches, structural interference in the backgrounds, etc (Xia et al., 2019; Chen et al., 2014) . To segment vascular structures, various segmentation methods have been explored. Traditional optimization models (Law & Chung, 2009; Taghizadeh Dehkordi et al., 2014) typically require complicated preprocessing steps and manual tuning. Furthermore, they are computationally expensive to process many images. On the other hand, learning-based methods (Nasr-Esfahani et al., 2016; Fan et al., 2018; Chen et al., 2019) generate segmentation maps in real-time once the models are trained. However, supervised methods require a huge amount of labeled data for training, which complicates their use in practical applications. Also, existing unsupervised methods designed on natural images are difficult to apply to medical vessel images due to low contrast subtle branches and confusing background structures. Although a recent self-supervised method (Ma et al., 2021) is presented to learn vessel representations, this requires two different adversarial networks to segment vessels, which leads to increasing training complexity. Recently, diffusion models such as denoising diffusion probabilistic model (DDPM) (Ho et al., 2020) has become one of the main research topics in modeling data distribution and sampling diverse images. By learning the Markov transformation of the reverse diffusion process from Gaussian 2020) . For high-level vision tasks, while a recent study (Baranchuk et al., 2021) shows that DDPM can capture semantic information and be used as image representations, methods applying DDPM in learning semantic segmentation without labeled data have so far not been developed. Also, the sampling process of the diffusion models often takes a relatively long time. In this paper, we introduce a novel concept of diffusion adversarial representation learning (DARL), which is a non-iterative version of the diffusion-based generative model and can be successfully applied to self-supervised vessel segmentation without ground-truth labels. As illustrated in Figure 1, our model is composed of a diffusion module and a generation module, which learns semantic information of vessels via adversarial learning. Specifically, based on the observation that the diffusion model estimates the noise added to the perturbed input data, and the adversarial learning model generates images for given the noisy vectors, we can naturally connect the diffusion model with the adversarial model. This allows our model not only to generate images in real time but also to segment vessels with robustness to noises and various modalities. Here, inspired by the spatiallyadaptive denormalization (SPADE) layer (Park et al., 2019) that is effective in image synthesis given semantic masks, we present a switchable version of SPADE in the generation module to estimate vessel segmentation maps and mask-based fake angiograms simultaneously. This can yield a synergy effect in learning vessel representation by extracting proper features for angiogram synthesis. More specifically, as shown in Figure 1 , for given unpaired background images and angiography images that are taken before and after injection of the contrast agent, there are two paths for feeding the inputs into our proposed model: (A) when the real angiography images are given, our model without the SPADE estimates vessel segmentation maps; (B) when the background images are given, our model with the SPADE generates synthetic angiograms that composite vessel-like semantic masks with the input backgrounds. Also, as each vessel-like semantic mask in the (B) path can be regarded as the pseudo-label for the generated angiography image, by feeding the synthetic angiograms into the (A) path again, we apply the cycle consistency between the segmentation maps and the labels of fractal masks to capture semantic information of vessels. In addition, by designing the diffusion module to intensively learn the background signal, we let the module consider vessel structures of angiography images as outlier when estimating the latent feature. Thereby, vessel structures represented in the output of the diffusion module can guide the generation module to effectively segment the vessels. We build our model on X-ray coronary angiography using XCAD dataset (Ma et al., 2021) and apply to several different blood vessel datasets, including retinal images. Experimental results show that our method outperforms several baseline methods by large margins for vessel segmentation tasks in the absence of labeled data. The main contributions are summarized as: 1. We propose a diffusion adversarial representation model, a non-iterative version of diffusion model for image generation, and apply it for self-supervised vessel segmentation.



Figure 1: Our proposed diffusion adversarial representation model for self-supervised vessel segmentation. In path (A), given a real noisy angiography image x a ta , our model estimates vessel segmentation masks ŝv . In path (B), given a real noisy background image x b t b and a vessel-like fractal mask s f , our model generates a synthetic angiography image xa .

