ADVERSARIAL PRIVACY PRESERVATION IN MRI SCANS OF THE BRAIN

Abstract

De-identification of magnetic resonance imagery (MRI) is intrinsically difficult since, even with all metadata removed, a person's face can easily be rendered and matched against a database. Existing de-identification methods tackle this task by obfuscating or removing parts of the face, but they either fail to reliably hide the patient's identity or they remove so much information that they adversely affect further analyses. In this work, we describe a new class of MRI de-identification techniques that remodel privacy-sensitive facial features as opposed to removing them. To accomplish this, we propose a conditional, multi-scale, 3D GAN architecture that takes a patient's MRI scan as input and generates a 3D volume in which the brain is not modified but the face has been de-identified. Compared to the classical removal-based techniques, our deep learning framework preserves privacy more reliably without adversely affecting downstream medical analyses on the brain, including segmentation and age prediction.

1. INTRODUCTION

Magnetic Resonance Images (MRI) are an essential tool used both in diagnostic and research settings, but they are a privacy risk. Detailed renderings of the head can be crafted from MRI scans using techniques such as volumetric raycasting. Those renderings, when matched against facial images, can be used to infer patient identity in a type of attack already demonstrated for CT scans (Mazura et al., 2012) . Commonly, MRI scans are de-identified before sharing using crude removalbased techniques, which seek to remove privacy-sensitive parts of the head without disturbing the brain (Figure 1 ). However, as we demonstrate, these techniques often fail to reliably mask the patient's identity, or they are so aggressive that they adversely affect downstream medical analyses on the brain, e.g. segmentation and age prediction. In this work, instead of removing potentially essential parts of the MRI scans of the head and brain, we propose to de-identify them by reshaping the privacy-sensitive regions without altering the content of medically relevant data. Our approach is to remodel privacy-sensitive facial structures rather than remove them, while leaving the brain untouched. Unlike removal-based approaches, under our method the head and face exhibit realistic appearance and structure. To accomplish this, we propose a novel multi-scale volumetric Generative Adversarial Network (GAN), called C-DeID-GAN, that conditions on a convex hull of the skull extracted from the scan to be de-identified. The generator learns to synthesize MRI volumes that preserve medically-sensitive regions such as the brain, while non-invertibly remodeling privacysensitive characteristics such as the face from the original scan. It is worthwhile to point out why such an approach is necessary, when methods that extract the brain -so-called skull-stripping methods -already exist. In short, automated measurements behave unpredictably when data is removed. As recently shown by De Sitter et al. (2020) , software designed to perform measurements (e.g. brain segmentation or age estimation) are developed to work robustly for original data (Smith et al., 2004; Schmidt et al., 2012) . If measurements are made on data deidentified by removal, it can result in inaccuracies or even total failure. Thus, remodeling rather than deleting the privacy-sensitive region would be desirable because it can protect privacy and at the same time ensure robustness of the downstream medical analyses. The main contributions of this work are as follows: 1. We define a novel methodology to ensure privacy in medical imagery that enables the sharing of data in which medically relevant regions are preserved and privacy-sensitive regions are de-identified realistically 2. We propose C-DeID-GAN, a conditional multi-scale volumetric GAN that realizes a solution to the aforementioned methodology 3. We show that C-DeID-GAN preserves privacy in MRI scans more reliably than removal-based techniques without adversely affecting downstream analyses. In addition, we make technical contributions towards the generation of the convex hull and surface representations necessary for the privacy conditioning of the GAN.

2. RELATED WORK

A handful of de-identification techniques exist for MRI data, which are conventionally used for sharing and distribution. The most common include removal-based approaches shown in Figure 1 , FACE MASK (Milchenko & Marcus, 2013 ) DEFACE (Bischoff-Grethe et al., 2007) , QUICKSHEAR (Schimke et al., 2011) , and MRI WATERSHED (Ségonne et al., 2004) all of which we describe in the Appendix. De Sitter et al. (2020) were the first to report that these methods should be used with caution as they remove regions of the face expected by algorithms for brain segmentation and other tasks. As already pointed our in the introduction, failure modes include estimations being inaccurate or, in the worst case, it might even be impossible to perform the measurement at all. These de-identification approaches are relatively primitive and a more modern approach is currently lacking in the literature. However, Shin et al. ( 2018) recently proposed a pix2pix-inspired (Isola et al., 2016) model to generate synthetic abnormal MRI images with brain tumors. In this work, the authors argue that, in principle, their approach can be used to generate a completely artificial corpus where none of the scans can be attributed to actual patients. The downside is that brain data is also hallucinated which adversely affects medical analyses. In contrast, our approach de-identifies privacy-sensitive information of every patient, but fully preserves the medically relevant information. More broadly, the literature on the removal of privacy-sensitive information from image data largely focuses on de-identification of photographs of faces (Jourabloo et al., 2015; Newton et al., 2005) . Among these, Deep Privacy (Hukkelås et al., 2019) is the closest to our approach as it was the first to use GANs to de-identify faces. It conditions on an a priori binary segmentation, guiding the generator to inpaint privacy-sensitive regions while preserving insensitive regions. Whereas Deep Privacy de-identifies conventional images of size 128×128, our goal is to generate much higher dimensional 3D MRI volumes at 128 3 voxels -the equivalent of a 1448×1448 image. To identify privacy-sensitive face regions for conditional inpainting, Deep Privacy relies on a standard detector (Liu et al., 2015) . Because a 3D analog does not exist, we develop an approach to extract a convex hull enclosing the head and mask of the brain for conditioning.



Figure 1: Privacy concerns in MRI scans and methods to prevent identification. (left) Detailed 3D renderings of human heads with identifiable features can be crafted from MRI scans and used to identify patients (Mazura et al., 2012). (center) Existing de-identification approaches attempt to remove privacy-sensitive parts of the head, but alter the structure and appearance and often fail to reliably mask the patient's identity. (right) We remodel privacy-sensitive facial structures while leaving the brain untouched using a conditional multi-scale volumetric GAN.

