CONTINUOUS TRANSFER LEARNING

Abstract

Transfer learning has been successfully applied across many high-impact applications. However, most existing work focuses on the static transfer learning setting, and very little is devoted to modeling the time evolving target domain, such as the online reviews for movies. To bridge this gap, in this paper, we focus on the continuous transfer learning setting with a time evolving target domain. One major challenge associated with continuous transfer learning is the time evolving relatedness of the source domain and the current target domain as the target domain evolves over time. To address this challenge, we first derive a generic generalization error bound on the current target domain with flexible domain discrepancy measures. Furthermore, a novel label-informed C-divergence is proposed to measure the shift of joint data distributions (over input features and output labels) across domains. It could be utilized to instantiate a tighter error upper bound in the continuous transfer learning setting, thus motivating us to develop an adversarial Variational Auto-encoder algorithm named CONTE by minimizing the C-divergence based error upper bound. Extensive experiments on various data sets demonstrate the effectiveness of our CONTE algorithm.

1. INTRODUCTION

Transfer learning has achieved significant success across multiple high-impact application domains (Pan & Yang, 2009) . Compared to conventional machine learning methods assuming both training and test data have the same data distribution, transfer learning allows us to learn the target domain with limited label information by leveraging a related source domain with abundant label information (Ying et al., 2018) . However, in many real applications, the target domain is constantly evolving over time. For example, the online movie reviews are changing over the years: some famous movies were not well received by the mainstream audience when they were first released, but became famous only years later (e.g., Citizen Cane, Fight Club, and The Shawshank Redemption); whereas the online book reviews typically do not have this type of dynamics. It is challenging to transfer knowledge from the static source domain (e.g., the book reviews) to the time evolving target domain (e.g., the movie reviews). Therefore, in this paper, we study the transfer learning setting with a static source domain and a continuously time evolving target domain (see Figure 1 ), which has not attracted much attention from the research community and yet is commonly seen across many real applications. The unique challenge for continuous transfer learning lies in the time evolving nature of the task relatedness between the static source domain and the time evolving target domain. Although the change in the target data distribution in consecutive time stamps might be small, over time, the cumulative change in the target domain might even lead to negative transfer (Rosenstein et al., 2005) . Existing theoretical analysis on transfer learning (Ben-David et al., 2010; Mansour et al., 2009) showed that the target error is typically bounded by the source error, the domain discrepancy of marginal data distributions and the difference of labeling functions. However, it has been observed (Zhao et al., 2019; Wu et al., 2019) that marginal feature distribution alignment might not guarantee the minimization of the target error in real world scenarios. This indicates that in the context of continuous transfer learning, marginal feature distribution alignment would lead to the sub-optimal solution (or even negative transfer) with undesirable predictive performance when directly transferring from D S to the target domain D Tt at the t th time stamp. This paper aims to bridge the gap in terms of both the theoretical analysis and the empirical solutions for the target domain with a time evolving distribution, which lead to a novel continuous transfer learning algorithm as well as the characterization of negative transfer. The main contributions of this paper are summarized as follows: (1) We derive a generic error bound for continuous transfer learning setting with flexible domain divergence measures; (2) We propose a label-informed domain discrepancy measure (C-divergence) with its empirical estimate, which instantiates a tighter error bound for continuous transfer learning setting; (3) Based on the proposed C-divergence, we design a novel adversarial Variational Auto-encoder algorithm (CONTE) for continuous transfer learning; (4) Extensive experimental results on various data sets verify the effectiveness of the proposed CONTE algorithm. The rest of the paper is organized as follows. Section 2 introduces the notation and our problem definition. We derive a generic error bound for continuous transfer learning setting in Section 3. Then we propose a novel C-divergence in Section 4, followed by a instantiated error bound and a novel continuous transfer learning algorithm in Section 5. The experimental results are provided in Section 6. We summarize the related work in Section 7, and conclude the paper in Section 8.

2. PRELIMINARIES

In this section, we introduce the notation and problem definition of continuous transfer learning.

2.1. NOTATION

We use X and Y to denote the input space and label space. Let D S and D T denote the source and target domains with data distribution p S (x, y) and p T (x, y) over X ⇥ Y, respectively. Let H be a hypothesis class on X , where a hypothesis is a function h : X ! Y. The notation is summarized in Table 3 in the appendices.

2.2. PROBLEM DEFINITION

Transfer learning (Pan & Yang, 2009) refers to the knowledge transfer from source domain to target domain such that the prediction performance on the target domain could be significantly improved as compared to learning from the target domain alone. However, in some applications, the target domain is changing over time, hence the time evolving relatedness between the source and target domains. This motivates us to consider the transfer learning setting with the time evolving target domain, which is much less studied as compared to the static transfer learning setting. We formally define the continuous transfer learning problem as follows. Notice that the source domain D S can be considered a special initial domain for the time-evolving target domain. Therefore, for notation simplicity, we will use D T0 to represent the source domain in this paper. It assumes that there are m T0 labeled source examples drawn independently from a source domain D T0 and m Tj labeled target examples drawn independently from a target domain D Tj at time stamp j.

3. A GENERIC ERROR BOUND

Given a static source domain and a time evolving target domain, continuous transfer learning aims to improve the target predictive function over D Tt+1 using the source domain and historical target domain. We begin by considering the binary classification setting, i.e., Y = {0, 1}. The source error of a hypothesis h can be defined as follows: ✏ T0 (h) = E (x,y)⇠p T 0 (x,y) ⇥ L(h(x), y) ⇤ where L(•, •) is the loss function. Its empirical estimate using source labeled examples is denoted as ✏T0 (h). Similarly, we define the target error ✏ Tj (h) and the empirical estimate of the target error ✏Tj (h) over the target distribution p Tj (x, y) at time stamp j. A natural domain discrepancy measure over joint distributions on X ⇥ Y between features and class labels can be defined as follows: d 1 (D T0 , D T ) = sup Q2Q Pr D T 0 [Q] Pr D T [Q] ( ) where Q is the set of measurable subsets under p T0 (x, y) and p T (x, y)foot_0 . Then, the error bound of continuous transfer learning is given by the following theorem. Theorem 3.1. Assume the loss function L is bounded with 0  L  M . Given a source domain D T0 and historical target domain {D Ti } t i=1 , for h 2 H, the target domain error ✏ Tt+1 on D t+1 is



Note that it is slightly different from L1 or variation divergence in(Ben-David et al., 2010) with only marginal distribution of features involved.



Figure 1: Illustration of continuous transfer learning. It learns a predictive function in D Tt using knowledge from both source domain D S and historical target domain D Ti (i = 1, • • • , t 1). Directly transferring from the source domain D S to the target domain D Tt might lead to negative transfer with undesirable predictive performance.

Definition 2.1. (Continuous Transfer Learning) Given a source domain D S (available at time stamp j = 1) and a time evolving target domain {D Tj } n j=1 with time stamp j, continuous transfer learning aims to improve the prediction function for target domain D Tt+1 using the knowledge from source domain D S and the historical target domain D Tj (j = 1, • • • , t).

