NETWORK-AGNOSTIC KNOWLEDGE TRANSFER FOR MEDICAL IMAGE SEGMENTATION

Abstract

Conventional transfer learning leverages weights of pre-trained networks, but mandates the need for similar neural architectures. Alternatively, knowledge distillation can transfer knowledge between heterogeneous networks but often requires access to the original training data or additional generative networks. Knowledge transfer between networks can be improved by being agnostic to the choice of network architecture and reducing the dependence on original training data. We propose a knowledge transfer approach from a teacher to a student network wherein we train the student on an independent transferal dataset, whose annotations are generated by the teacher. Experiments were conducted on five state-of-the-art networks for semantic segmentation and seven datasets across three imaging modalities. We studied knowledge transfer from a single teacher, combination of knowledge transfer and fine-tuning, and knowledge transfer from multiple teachers. The student model with a single teacher achieved similar performance as the teacher; and the student model with multiple teachers achieved better performance than the teachers. The salient features of our algorithm include: 1) no need for original training data or generative networks, 2) knowledge transfer between different architectures, 3) ease of implementation for downstream tasks by using the downstream task dataset as the transferal dataset, 4) knowledge transfer of an ensemble of models, trained independently, into one student model. Extensive experiments demonstrate that the proposed algorithm is effective for knowledge transfer and easily tunable.

1. INTRODUCTION

Deep learning often requires a sufficiently large training dataset, which is expensive to build and not easy to share between users. For example, a big challenge with semantic segmentation of medical images is the limited availability of annotated data (Litjens et al., 2017) . Due to ethical concerns and confidentiality constraints, medical datasets are not often released with the trained networks. This highlights the need for knowledge transfer between neural networks, wherein the original training dataset does not need to be accessed. On the other hand, according to the black-box metaphor in deep learning based methods, transferring knowledge is difficult between heterogeneous neural networks. To address these limitations, algorithms were proposed to reuse or share the knowledge of neural networks, such as network weight transfer (Tan et al., 2018) Some conventional algorithms directly transfer the weights of standard large models that were trained on natural image datasets for different tasks (Kang & Gwak, 2019; Motamed et al., 2019; Jodeiri et al., 2019; Raghu et al., 2019) . For example, Iglovikov & Shvets (2018) adopted VGG11 pre-trained on ImageNet as the encoder of U-Net for 2D image segmentation. Similarly, the convolutional 3D (Tran et al., 2015) , pre-trained on natural video datasets, was used as the encoder of 3D U-Net for the 3D MR (Magnetic Resonance) medical image segmentation (Zeng et al., 2017) . Transferring the network weights generally requires adjustments to be made to the architecture of the receiver model, this in turn, limits the flexibility of the receiver network. Another technique that involves knowledge transfer is federated learning (Yang et al., 2019) ; it has received attention for its capability to train a large-scale model in a decentralized manner without requiring users' data. In general, federated learning approaches adopt the central model to capture the



, knowledge distillation (Hinton et al., 2015), federated learning (Yang et al., 2019), and self-training (Xie et al., 2020b).

