TRANSFER LEARNING WITH PRE-TRAINED CONDITIONAL GENERATIVE MODELS

Abstract

Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to storage costs and privacy, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with an artificial dataset synthesized by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform the baselines of scratch training and knowledge distillation.

1. INTRODUCTION

For training deep neural networks on new tasks, transfer learning is essential, which leverages the knowledge of related (source) tasks to the new (target) tasks via the joint-or pre-training of source models. There are many transfer learning methods for deep models under various conditions (Pan & Yang, 2010; Wang & Deng, 2018) . For instance, domain adaptation leverages source knowledge to the target task by minimizing the domain gaps (Ganin et al., 2016) , and fine-tuning uses the pre-trained weights on source tasks as the initial weights of the target models (Yosinski et al., 2014) . These existing powerful transfer learning methods always assume at least one of (i) source and target label spaces have overlaps, e.g., a target task composed of the same class categories as a source task, (ii) source datasets are available, and (iii) consistency of neural network architectures i.e., the architectures in the target task must be the same as that in the source task. However, these assumptions are seldom satisfied in real-world settings (Chang et al., 2019; Kenthapadi et al., 2019; Tan et al., 2019) . For instance, suppose a case of developing an image classifier on a totally new task for an embedded device in an automobile company. The developers found an optimal neural architecture for the target dataset and the device by neural architecture search, but they cannot directly access the source dataset for the reason of protecting customer information. In such a situation, the existing transfer learning methods requiring the above assumptions are unavailable, and the developers cannot obtain the best model. To promote the practical application of deep models, we argue that we should reconsider the three assumptions on which the existing transfer learning methods depend. For assumption (i), new target tasks do not necessarily have the label spaces overlapping with source ones because target labels are often designed on the basis of their requisites. In the above example, if we train models on StanfordCars (Krause et al., 2013) , which is a fine-grained car dataset, there is no overlap with ImageNet (Russakovsky et al., 2015) even though ImageNet has 1000 classes. For (ii), the accessibility of source datasets is often limited due to storage costs and privacy (Liang et al., 2020; Kundu et al., 2020; Wang et al., 2021a) , e.g., ImageNet consumes over 100GB and contains person faces co-occurring with objects that potentially raise privacy concerns (Yang et al., 2022) . For (iii), the consistency of the source and target architectures is broken if the new architecture is specialized for

