TRANSFER LEARNING VIA CONTEXT-AWARE FEA-TURE COMPENSATION

Abstract

Transfer learning aims to reuse the learnt representations or subnetworks to a new domain with minimum effort for adaption. Here, the challenge lies in the mismatch between source domain and target domain, which is the major gap to be tackled by transfer learning. Hence, how to identify the mismatch between source and target domain becomes a critical problem. We propose an end-to-end framework to learn feature compensation for transfer learning with soft gating to decide whether and how much feature compensation is needed, accounting for the mismatch between source domain and target domain. To enable identifying the position of the input in reference to the overall data distribution of source domain, we perform clustering at first to figure out the data distribution in a compact form represented by cluster centers, and then use the similarities between the input and the cluster centers to describe the relative position of the input in reference to the cluster centers. This acts as the context to indicate whether and how much feature compensation is needed for the input to compensate for the mismatch between source domain and target domain. To approach that, we add only two subnetworks in the form of Multilayer Perceptron, one for computing the feature compensation and the other for soft gating the compensation, where both are computed based on the context. The experiments show that such minor change to backbone network can result in significant performance improvements compared with the baselines on some widely used benchmarks.

1. INTRODUCTION

Transfer learning aims to reuse the knowledge obtained from one domain to solve the problem in a new domain with little effort and minor change. A classical problem that needs intensive effort is pattern recognition, where training of a model consumes a lot of time and requires a large number of data examples with annotations. If we can reuse the pre-trained model for another task, it leads to efficient and rapid development of a new solution. For example, a speech recognition model trained on one language can be transferred to recognize another language more easily by using transfer learning (Huang et al., 2013) . In addition, the knowledge learnt for visual navigation can be transferred to a new environment with ease through transfer learning (Al-Halah et al., 2022) . So far, a lot of transfer learning methods have been proposed and it has been attracting much attention. In general, a pattern recognition system is composed of feature extraction and classifier. In the context of deep neural networks with end-to-end learning, some portions of a network act as filters to perform feature extraction while the last layer is in general a fully-connected layer to conduction classification. One solution for transfer learning is to share the feature extraction portion of the network and apply different classifiers for different tasks (Oquab et al., 2014) . When there is a big gap between the source and target domain in terms of feature distribution, simply modifying classifier will fail to solve the mismatch between the two domains. Hence, another solution aims to identify reusable features or subnetworks performing feature extraction (Huang et al., 2013) . Yet, this relies on the assumption that there exist some coherent features that can be shared by different domains. In view of the distinction between source and target domain in terms of feature representation, another solution tries to make the features of two domains approach each other through representation learning, for example, applying certain regularization to loss function (Zhong & Maki, 2020) or a domain classifier (Ajakan et al., 2014) to perform adversarial test on homogeneous or not. Even though, they are still based on such assumption that feature distributions of two domains can have substantial overlap. However, this hypothesis might not always hold significantly in practice. Hereafter, another solution to modify the network structure arises, for example, using an agent to search for reusable subnetworks to form a new pipeline for target domain, where the agent could be implemented using neural networks (Guo et al., 2020; Liu et al., 2021) . However, these works change the backbone network greatly, which deviates from the origin of transfer learning, that is, solve new problem with off-the-shelf solution by paying minimum effort to make tiny change. As aforementioned, the critical issue in terms of transfer learning is to identify the reusable part of the existing learnt representations or subnetworks. In another word, due to the overlap between source domain and target domain, not every data example needs transferring, and how much transferring is needed for a data example is subject to its position in the overall data distribution. If the data example of interest locates in the overlap part between source domain and target domain, it needs minor change. On the contrary, if the data example exists in a feature space where the data distribution of source domain is significantly different that of target domain, it needs significant change through transfer learning correspondingly. This gives rise to a new problem, that is, the method to evaluate how much mismatch exists around the input between source domain and target domain. This involves development of a means to figure out the overall data distribution as well as a descriptor to locate the position of the input in reference to the overall data distribution, say, context. With sound context description in terms of figuring out the position of the input in regard to the overall data distribution, one can then proceed to evaluate whether and how much transferring is needed to compensate for the mismatch between source domain and target domain. For this sake, we propose a new method for context description: We apply clustering at first to represent the data distribution in a compact form of cluster centers, and then compute the similarities between the input and the cluster centers to define its location in the overall data distribution, which acts as a clue to lead subsequent computation on how much compensation is needed to enhance the feature representation from backbone network. Based on such a scheme for context description, we propose an end-to-end framework to learn feature compensation for transfer learning with soft gating to decide whether and how much feature compensation is needed, accounting for the mismatch between source domain and target domain. To approach that, we add only two subnetworks in the form of Multilayer Perceptron (MLP) to backbone network, one for computing the feature compensation and the other for soft gating the compensation, where both are computed based on the context, that is, the feature compensation is context adaptive. The contribution of this study is as follows: 1. A novel method is proposed for context-aware feature compensation, where similarities between the input and the cluster centers representing the overall data distribution is used as the clue to evaluate how much feature compensation is needed to solve the mismatch between source domain and target domain. The new architecture only incorporates two additional MLP modules, one for computing feature compensation and the other for soft gating the compensation. This meets almost all the requirements of transfer learning: Minor network-level change with two tiny components added, feature-level compensation with context awareness on whether and how much transferring is needed, and minimum effort to render a sound solution with big performance improvement; 2. The concept of context based on self-positioning in reference to the anchors (cluster centers) is proposed for figuring out the demanding degree in transferring the representation of each given example. Correspondingly, feature compensation is computed by incorporating the context into the existing representation while a soft gating is applied to highlight the degree that each data example needs transferring, where both the context-directed feature compensation and the soft gating are leant end to end; 3. The experiments on 5 data sets demonstrate the effectiveness as well as the advantages of our solution in comparison with the baselines.

2. RELATED WORK

Transfer learning aims to transfer the learnt knowledge from source domain to target domain (Thrun, 1995) so as to allow pre-trained models reused with minimum effort and good transferability. The existing methods can be sorted into 4 categories (Tan et al., 2018) : Instances-based, mapping-based, adversarial-based, and network-based deep transfer learning.

