CONTINUOUS TRANSFER LEARNING

Abstract

Transfer learning has been successfully applied across many high-impact applications. However, most existing work focuses on the static transfer learning setting, and very little is devoted to modeling the time evolving target domain, such as the online reviews for movies. To bridge this gap, in this paper, we focus on the continuous transfer learning setting with a time evolving target domain. One major challenge associated with continuous transfer learning is the time evolving relatedness of the source domain and the current target domain as the target domain evolves over time. To address this challenge, we first derive a generic generalization error bound on the current target domain with flexible domain discrepancy measures. Furthermore, a novel label-informed C-divergence is proposed to measure the shift of joint data distributions (over input features and output labels) across domains. It could be utilized to instantiate a tighter error upper bound in the continuous transfer learning setting, thus motivating us to develop an adversarial Variational Auto-encoder algorithm named CONTE by minimizing the C-divergence based error upper bound. Extensive experiments on various data sets demonstrate the effectiveness of our CONTE algorithm.

1. INTRODUCTION

Transfer learning has achieved significant success across multiple high-impact application domains (Pan & Yang, 2009) . Compared to conventional machine learning methods assuming both training and test data have the same data distribution, transfer learning allows us to learn the target domain with limited label information by leveraging a related source domain with abundant label information (Ying et al., 2018) . However, in many real applications, the target domain is constantly evolving over time. For example, the online movie reviews are changing over the years: some famous movies were not well received by the mainstream audience when they were first released, but became famous only years later (e.g., Citizen Cane, Fight Club, and The Shawshank Redemption); whereas the online book reviews typically do not have this type of dynamics. It is challenging to transfer knowledge from the static source domain (e.g., the book reviews) to the time evolving target domain (e.g., the movie reviews). Therefore, in this paper, we study the transfer learning setting with a static source domain and a continuously time evolving target domain (see Figure 1 ), which has not attracted much attention from the research community and yet is commonly seen across many real applications. The unique challenge for continuous transfer learning lies in the time evolving nature of the task relatedness between the static source domain and the time evolving target domain. Although the change in the target data distribution in consecutive time stamps might be small, over time, the cumulative change in the target domain might even lead to negative transfer (Rosenstein et al., 2005) . Existing theoretical analysis on transfer learning (Ben-David et al., 2010; Mansour et al., 2009) showed that the target error is typically bounded by the source error, the domain discrepancy of marginal data distributions and the difference of labeling functions. However, it has been observed (Zhao et al., 2019; Wu et al., 2019) that marginal feature distribution alignment might not guarantee the minimization of the target error in real world scenarios. This indicates that in the context of continuous transfer learning, marginal feature distribution alignment would lead to the sub-optimal solution (or even negative transfer) with undesirable predictive performance when directly transferring from D S to the target domain D Tt at the t th time stamp. This paper aims to bridge the gap in terms of both the theoretical analysis and the empirical solutions for the target domain with a time evolving distribution, which lead to a novel continuous transfer learning algorithm as



Figure 1: Illustration of continuous transfer learning. It learns a predictive function in D Tt using knowledge from both source domain D S and historical target domain D Ti (i = 1, • • • , t 1). Directly transferring from the source domain D S to the target domain D Tt might lead to negative transfer with undesirable predictive performance.

