NETWORK ARCHITECTURE SEARCH FOR DOMAIN ADAPTATION

Abstract

Deep networks have been used to learn transferable representations for domain adaptation. Existing deep domain adaptation methods systematically employ popular hand-crafted networks designed specifically for image-classification tasks, leading to sub-optimal domain adaptation performance. In this paper, we present Neural Architecture Search for Domain Adaptation (NASDA), a principle framework that leverages differentiable neural architecture search to derive the optimal network architecture for domain adaptation task. NASDA is designed with two novel training strategies: neural architecture search with multi-kernel Maximum Mean Discrepancy to derive the optimal architecture, and adversarial training between a feature generator and a batch of classifiers to consolidate the feature generator. We demonstrate experimentally that NASDA leads to state-of-the-art performance on several domain adaptation benchmarks.

1. INTRODUCTION

Supervised machine learning models (Φ) aim to minimize the empirical test error ( (Φ(x), y)) by optimizing Φ on training data (x) and ground truth labels (y), assuming that the training and testing data are sampled i.i.d from the same distribution. While in practical, the training and testing data are typically collected from related domains under different distributions, a phenomenon known as domain shift (or domain discrepancy) (Quionero-Candela et al., 2009) . To avoid the cost of annotating each new test data, Unsupervised Domain Adaptation (UDA) tackles domain shift by transferring the knowledge learned from a rich-labeled source domain (P (x s , y s )) to the unlabeled target domain (Q(x t )). Recently unsupervised domain adaptation research has achieved significant progress with techniques like discrepancy alignment (Long et al., 2017; Tzeng et al., 2014; Ghifary et al., 2014; Peng & Saenko, 2018; Long et al., 2015; Sun & Saenko, 2016 ), adversarial alignment (Xu et al., 2019a; Liu & Tuzel, 2016; Tzeng et al., 2017; Liu et al., 2018a; Ganin & Lempitsky, 2015; Saito et al., 2018; Long et al., 2018), and reconstruction-based alignment (Yi et al., 2017; Zhu et al., 2017; Hoffman et al., 2018; Kim et al., 2017) . While such models typically learn feature mapping from one domain (Φ(x s )) to another (Φ(x t )) or derive a joint representation across domains (Φ(x s ) ⊗ Φ(x t )), the developed models have limited capacities in deriving an optimal neural architecture specific for domain transfer. To advance network designs, neural architecture search (NAS) automates the net architecture engineering process by reinforcement supervision (Zoph & Le, 2017) or through neuro-evlolution (Real et al., 2019a) . Conventional NAS models aim to derive neural architecture α along with the network parameters w, by solving a bilevel optimization problem (Anandalingam & Friesz, 1992) : (w, α) , where L train and L val indicate the training and validation loss, respectively. While recent works demonstrate competitive performance on tasks such as image classification (Zoph et al., 2018; Liu et al., 2018c; b; Real et al., 2019b) and object detection (Zoph & Le, 2017), designs of existing NAS algorithms typically assume that the training and testing domain are sampled from the same distribution, neglecting the scenario where two data domains or multiple feature distributions are of interest. Φ α,w = arg min α L val (w * (α), α) s.t. w * (α) = argmin w L train To efficiently devise a neural architecture across different data domains, we propose a novel learning task called Neural Architecture Search for Domain Adaptation (NASDA). The ultimate goal of NASDA is to minimize the validation loss of the target domain (L t val ). We postulate that a solution to NASDA should not only minimize validation loss of the source domain (L s val ), but should also reduce the domain gap between the source and target. To this end, we propose a new NAS learning schema: Φ α,w = argmin α L s val (w * (α), α) + disc(Φ * (x s ), Φ * (x t )) (1) s.t. w * (α) = argmin w L s train (w, α) where Φ * = Φ α,w * (α) , and disc(Φ * (x s ), Φ * (x t )) denotes the domain discrepancy between the source and target. Note that in unsupervised domain adaptation, L t train and L t val cannot be computed directly due to the lack of label in the target domain. Inspired by the past works in NAS and unsupervised domain adaptation, we propose in this paper an instantiated NASDA model, which comprises of two training phases, as shown in Figure 1 . The first is the neural architecture searching phase, aiming to derive an optimal neural architecture (α * ), following the learning schema of Equation 1,2. Inspired by Differentiable ARchiTecture Search (DARTS) (Liu et al., 2019a) , we relax the search space to be continuous so that α can be optimized with respect to L s val and disc(Φ(x s ), Φ(x t )) by gradient descent. Specifically, we enhance the feature transferability by embedding the hidden representations of the task-specific layers to a reproducing kernel Hilbert space where the mean embeddings can be explicitly matched by minimizing disc(Φ(x s ), Φ(x t )). We use multi-kernel Maximum Mean Discrepancy (MK-MMD) (Gretton et al., 2007) to evaluate the domain discrepancy. The second training phase aims to learn a good feature generator with task-specific loss, based on the derived α * from the first phase. To establish this goal, we use the derived deep neural network (Φ α * ) as the feature generator (G) and devise an adversarial training process between G and a batch of classifiers C. The high-level intuition is to first diversify C in the training process, and train G to generate features such that the diversified C can have similar outputs. The training process is similar to Maximum Classifier Discrepancy framework (MCD) (Saito et al., 2018) except that we extend the dual-classifier in MCD to an ensembling of multiple classifiers. Experiments on standard UDA benchmarks demonstrate the effectiveness of our derived NASDA model in achieving significant improvements over state-of-the-art methods. Our contributions of this paper are highlighted as follows: • We formulate a novel dual-objective task of Neural Architecture Search for Domain Adaptation (NASDA), which optimize neural architecture for unsupervised domain adaptation, concerning both source performance objective and transfer learning objective. • We propose an instantiated NASDA model that comprises two training stages, aiming to derive optimal architecture parameters α * and feature extractor G, respectively. We are the first to show the effectiveness of MK-MMD in NAS process specified for domain adaptation. • Extensive experiments on multiple cross-domain recognition tasks demonstrate that NASDA achieves significant improvements over traditional unsupervised domain adaptation models as well as state-of-the-art NAS-based methods.



Figure 1: An overview of NASDA: (a) Continuous relaxation of the research space by placing a mixture of the candidate operations on each edge. (b) Inducing the final architecture by joint optimization of the neural architecture parameters α and network weights w, supervised by minimizing the validation loss on the source domain and reducing the domain discrepancy. (c)(d) Adversarial training of the derive feature generator G and classifiers C.

