TRANSFER LEARNING VIA CONTEXT-AWARE FEA-TURE COMPENSATION

Abstract

Transfer learning aims to reuse the learnt representations or subnetworks to a new domain with minimum effort for adaption. Here, the challenge lies in the mismatch between source domain and target domain, which is the major gap to be tackled by transfer learning. Hence, how to identify the mismatch between source and target domain becomes a critical problem. We propose an end-to-end framework to learn feature compensation for transfer learning with soft gating to decide whether and how much feature compensation is needed, accounting for the mismatch between source domain and target domain. To enable identifying the position of the input in reference to the overall data distribution of source domain, we perform clustering at first to figure out the data distribution in a compact form represented by cluster centers, and then use the similarities between the input and the cluster centers to describe the relative position of the input in reference to the cluster centers. This acts as the context to indicate whether and how much feature compensation is needed for the input to compensate for the mismatch between source domain and target domain. To approach that, we add only two subnetworks in the form of Multilayer Perceptron, one for computing the feature compensation and the other for soft gating the compensation, where both are computed based on the context. The experiments show that such minor change to backbone network can result in significant performance improvements compared with the baselines on some widely used benchmarks.

1. INTRODUCTION

Transfer learning aims to reuse the knowledge obtained from one domain to solve the problem in a new domain with little effort and minor change. A classical problem that needs intensive effort is pattern recognition, where training of a model consumes a lot of time and requires a large number of data examples with annotations. If we can reuse the pre-trained model for another task, it leads to efficient and rapid development of a new solution. For example, a speech recognition model trained on one language can be transferred to recognize another language more easily by using transfer learning (Huang et al., 2013) . In addition, the knowledge learnt for visual navigation can be transferred to a new environment with ease through transfer learning (Al-Halah et al., 2022) . So far, a lot of transfer learning methods have been proposed and it has been attracting much attention. In general, a pattern recognition system is composed of feature extraction and classifier. In the context of deep neural networks with end-to-end learning, some portions of a network act as filters to perform feature extraction while the last layer is in general a fully-connected layer to conduction classification. One solution for transfer learning is to share the feature extraction portion of the network and apply different classifiers for different tasks (Oquab et al., 2014) . When there is a big gap between the source and target domain in terms of feature distribution, simply modifying classifier will fail to solve the mismatch between the two domains. Hence, another solution aims to identify reusable features or subnetworks performing feature extraction (Huang et al., 2013) . Yet, this relies on the assumption that there exist some coherent features that can be shared by different domains. In view of the distinction between source and target domain in terms of feature representation, another solution tries to make the features of two domains approach each other through representation learning, for example, applying certain regularization to loss function (Zhong & Maki, 2020) or a domain classifier (Ajakan et al., 2014) to perform adversarial test on homogeneous or not. Even though, they are still based on such assumption that feature distributions of two domains can have 1

