NETWORK ARCHITECTURE SEARCH FOR DOMAIN ADAPTATION

Abstract

Deep networks have been used to learn transferable representations for domain adaptation. Existing deep domain adaptation methods systematically employ popular hand-crafted networks designed specifically for image-classification tasks, leading to sub-optimal domain adaptation performance. In this paper, we present Neural Architecture Search for Domain Adaptation (NASDA), a principle framework that leverages differentiable neural architecture search to derive the optimal network architecture for domain adaptation task. NASDA is designed with two novel training strategies: neural architecture search with multi-kernel Maximum Mean Discrepancy to derive the optimal architecture, and adversarial training between a feature generator and a batch of classifiers to consolidate the feature generator. We demonstrate experimentally that NASDA leads to state-of-the-art performance on several domain adaptation benchmarks.

1. INTRODUCTION

Supervised machine learning models (Φ) aim to minimize the empirical test error ( (Φ(x), y)) by optimizing Φ on training data (x) and ground truth labels (y), assuming that the training and testing data are sampled i.i.d from the same distribution. While in practical, the training and testing data are typically collected from related domains under different distributions, a phenomenon known as domain shift (or domain discrepancy) (Quionero-Candela et al., 2009) . To avoid the cost of annotating each new test data, Unsupervised Domain Adaptation (UDA) tackles domain shift by transferring the knowledge learned from a rich-labeled source domain (P (x s , y s )) to the unlabeled target domain (Q(x t )). Recently unsupervised domain adaptation research has achieved significant progress with techniques like discrepancy alignment (Long et al., 2017; Tzeng et al., 2014; Ghifary et al., 2014; Peng & Saenko, 2018; Long et al., 2015; Sun & Saenko, 2016 ), adversarial alignment (Xu et al., 2019a; Liu & Tuzel, 2016; Tzeng et al., 2017; Liu et al., 2018a; Ganin & Lempitsky, 2015; Saito et al., 2018; Long et al., 2018) , and reconstruction-based alignment (Yi et al., 2017; Zhu et al., 2017; Hoffman et al., 2018; Kim et al., 2017) . While such models typically learn feature mapping from one domain (Φ(x s )) to another (Φ(x t )) or derive a joint representation across domains (Φ(x s ) ⊗ Φ(x t )), the developed models have limited capacities in deriving an optimal neural architecture specific for domain transfer. To advance network designs, neural architecture search (NAS) automates the net architecture engineering process by reinforcement supervision (Zoph & Le, 2017) or through neuro-evlolution (Real et al., 2019a) . Conventional NAS models aim to derive neural architecture α along with the network parameters w, by solving a bilevel optimization problem (Anandalingam & Friesz, 1992) : Φ α,w = arg min α L val (w * (α), α) s.t. w * (α) = argmin w L train (w, α), where L train and L val indicate the training and validation loss, respectively. While recent works demonstrate competitive performance on tasks such as image classification (Zoph et al., 2018; Liu et al., 2018c; b; Real et al., 2019b) and object detection (Zoph & Le, 2017) , designs of existing NAS algorithms typically assume that the training and testing domain are sampled from the same distribution, neglecting the scenario where two data domains or multiple feature distributions are of interest. To efficiently devise a neural architecture across different data domains, we propose a novel learning task called Neural Architecture Search for Domain Adaptation (NASDA). The ultimate goal of NASDA is to minimize the validation loss of the target domain (L t val ). We postulate that a solution to NASDA should not only minimize validation loss of the source domain (L s val ), but should also

