DISCRIMINATIVE CROSS-MODAL DATA AUGMENTA-TION FOR MEDICAL IMAGING APPLICATIONS

Abstract

While deep learning methods have shown great success in medical image analysis, they require a number of medical images to train. Due to data privacy concerns and unavailability of medical annotators, it is oftentimes very difficult to obtain a lot of labeled medical images for model training. In this paper, we study crossmodality data augmentation to mitigate the data deficiency issue in the medical imaging domain. We propose a discriminative unpaired image-to-image translation model which translates images in source modality into images in target modality where the translation task is conducted jointly with the downstream prediction task and the translation is guided by the prediction. Experiments on two applications demonstrate the effectiveness of our method.

1. INTRODUCTION

Developing deep learning methods to analyze medical images for decision-making has aroused much research interest in the past few years. Promising results have been achieved in using medical images for skin cancer diagnosis (Esteva et al., 2017; Tschandl et al., 2019) , chest diseases identification (Jaiswal et al., 2019) , diabetic eye disease detection (Cheung et al., 2019) , to name a few. It is well-known that deep learning methods are data-hungry. Deep learning models typically contain tens of millions of weight parameters. To effectively train such large-sized models, a large number of labeled training images are needed. However, in the medical domain, it is very difficult to collect labeled training images due to many reasons including privacy barriers, unavailability of doctors for annotating disease labels, etc. To address the deficiency of medical images, many approaches (Krizhevsky et al., 2012; Cubuk et al., 2018; Takahashi et al., 2019; Zhong et al., 2017; Perez & Wang, 2017) have been proposed for data augmentation. These approaches create synthetic images based on the original images and use the synthetic images as additional training data. The most commonly used data augmentation approaches include crop, flip, rotation, translation, scaling, etc. Augmented images created by these methods are oftentimes very similar to the original images. For example, a cropped image is part of the original image. In clinical practice, due to the large disparity among patients, the medical image of a new patient (during test time) is oftentimes very different from the images of patients used for model training. If the augmented images are very close to the original images, they are not very useful in improving the ability of the model to generalize to unseen patients. It is important to create diverse augmented images that are non-redundant with the original images. To create non-redundant augmented images for one modality such as CT, one possible solution is to leverage images from other modalities such as X-ray, MRI, PET, etc. In clinical practice, for the same disease, many different types of imaging techniques are applied to diagnose and treat this disease. For example, to diagnose lung cancer, doctors can use chest X-rays, CT scans, MRI scans, to name a few. As a result, different modalities of medical images are accumulated for the same disease. When training a deep learning model on an interested modality (denoted by X) of images, if the number of original images in this modality is small, we may convert the images in other modalities into the target modality X and use these converted images as additional training data. For example, when a hospital would like to train a deep learning model for CT-based lung cancer diagnosis, the hospital can collect MRI, X-ray, PET images about lung cancer and use them to augment the CT training dataset. Images of different modalities are typically from different patients. Therefore, their clinical diversity is large.

