ROBUST TRANSFER LEARNING BASED ON MINIMAX PRINCIPLE

Abstract

The similarity between target and source tasks is a crucial quantity for theoretical analyses and algorithm designs in transfer learning studies. However, this quantity is often difficult to be precisely captured. To address this issue, we make a boundedness assumption on the task similarity and then propose a mathematical framework based on the minimax principle, which minimizes the worst-case expected population risk under this assumption. Furthermore, our proposed minimax problem can be solved analytically, which provides a guideline for designing robust transfer learning models. According to the analytical expression, we interpret the influences of sample sizes, task distances, and the model dimensionality in knowledge transferring. Then, practical algorithms are developed based on the theoretical results. Finally, experiments conducted on image classification tasks show that our approaches can achieve robust and competitive accuracies under random selections of training sets.

1. INTRODUCTION

The goal of the transfer learning is to solve target tasks by the learning results from some source tasks. In order to study the fundamental aspects of the transfer learning problems, it is important to define and quantify the similarity between source and target tasks (Pan & Yang, 2009) . While it is assumed that the source and target tasks are kind of similar in transfer learning problems (Weiss et al., 2016) , the joint structures and similarity between the tasks can only be learned from the training data, which is challenging to be practically computed due to the limited availability of the labeled target samples. Therefore, in order to conduct meaningful theoretical analyses, it is often necessary to make extra assumptions, such as the linear combination of learning results (Ben-David et al., 2010) and linear regression transferring (Kuzborskij & Orabona, 2013) , which could be limited in many applications. As such, in this paper, we attempt to theoretically study the transfer learning by only assuming that the similarity between the source and target tasks is bounded, which is a weaker assumption, and is often valid in transfer learning problems. Under such an assumption, the minimax principle can be applied (Verdu & Poor, 1984) for estimating the target distribution. Based on this principle, the estimator minimizes the worst-case expected population risk (EPR) (Jin et al., 2018) under the bounded task distance constraint, which maintains robustness against the weak assumption. Practically, many empirical works have also followed the minimax setting and verify its validness (Zhang et al., 2019) , while the theoretical analyses appear to be rather behind. The main challenge of analyzing general minimax problems in transfer learning is due to the difficulty of computing the expectations of the population risk under popular distance measures, such as the Kullback-Leibler (K-L) divergence (Thomas & Joy, 2006) . To deal with this difficulty, we adopt the widely used χ 2 -distance and Hellinger distance (Csiszár & Shields, 2004) as the distance measure between data distributions of the tasks, and present a minimax formulation of transfer learning. By adopting such measures, the proposed minimax problems can be analytically solved. In particular, we show that the optimal estimation is to linearly combine the learning results of two tasks, where the combining coefficient can be computed from the training data. This provides a theoretical justification for many existing analyzing framework and algorithms (Ben-David et al., 2010; Garcke & Vanck, 2014) . Note that the recent work (Tong et al., 2021 ) also analytically evaluates the combining coefficients, which rely on the underlying task distributions that are not available for real applications. Our work essentially provides the combining coefficient that

