INTERPRETATIONS OF DOMAIN ADAPTATIONS VIA LAYER VARIATIONAL ANALYSIS

Abstract

Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with corresponding data conditions. Moreover, our theoretical calculation yields intuitive interpretations towards the knowledge transfer process. Subsequently, an alternative method for network-based transfer learning is derived. The method shows an increase in efficiency and accuracy for domain adaptation. It is particularly advantageous when new domain data is sufficiently sparse during adaptation. Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method.

1. INTRODUCTION

Transfer learning is a technique applied to neural networks admitting rapid learning from one (source) domain to another domain, and it mimics human brains in terms of cognitive understanding. The concept of transfer learning has been considerably advantageous, and different frameworks have been formulated for various applications in different fields. For instance, it has been widely applied in image classification (Quattoni et al., 2008; Zhu et al., 2011; Hussain et al., 2018) , object detection (Shin et al., 2016) , and natural language processing (NLP) (Houlsby et al., 2019; Raffel et al., 2019) . In addition to applications in computer vision and NLP, transferability is fundamentally and directly related to domain adaptation and adversarial learning (Luo et al., 2017; Cao et al., 2018; Ganin et al., 2016) . Another majority field adopting transfer learning is domain adaptation, which investigates transition problems between two close domains (Kouw & Loog, 2018) . A typical understanding is that transfer learning deals with a general problem where two domains can be rather distinct, allowing sample space and label space to differ. While domain adaptation is considered a subfield in transfer learning where the sample/label spaces are fixed with only the probability distributions allowed to be varied. Several studies have investigated the transferability of network features or representations through experimentation, and discussed their relation to network structures (Yosinski et al., 2014) , features, and parameter spaces (Neyshabur et al., 2020; Gonthier et al., 2020) . In general, all methods that improve the predictive performance of a target domain, using knowledge of a source domain, are considered under the transfer learning category (Weiss et al., 2016; Tan et al., 2018) . This work particularly focuses on network-based transfer-learning, which refers to a specific framework that reuses a pretrained network. This approach is often referred to as finetuning, which has been shown powerful and widely applied with deep-learning models (Ge & Yu, 2017; Guo et al., 2019) . Even with abundant successes in applications, the understanding of the network-based transfer learning mechanism from a theoretical framework remains limited. This paper presents a theoretical framework set out from aspects of functional variation analysis (Gelfand et al., 2000) to rigorously discuss the mechanism of transfer learning. Under the framework, error estimates can be computed to support the foundation of transfer-learning, and an interpretation is provided to connect the theoretical derivations with transfer learning mechanism. Our contributions can be summarized as follows: we formalize transfer learning in a rigorous setting and variational analysis to build up a theoretical foundation for the empirical technique. A theorem is 1

