CROSS-DOMAIN FEW-SHOT LEARNING BY REPRESENTATION FUSION

Abstract

In order to quickly adapt to new data, few-shot learning aims at learning from few examples, often by using already acquired knowledge. The new data often differs from the previously seen data due to a domain shift, that is, a change of the inputtarget distribution. While several methods perform well on small domain shifts like new target classes with similar inputs, larger domain shifts are still challenging. Large domain shifts may result in high-level concepts that are not shared between the original and the new domain. However, low-level concepts like edges in images might still be shared and useful. For cross-domain few-shot learning, we suggest representation fusion to unify different abstraction levels of a deep neural network into one representation. We propose Cross-domain Hebbian Ensemble Few-shot learning (CHEF), which achieves representation fusion by an ensemble of Hebbian learners acting on different layers of a deep neural network that was trained on the original domain. On the few-shot datasets miniImagenet and tieredImagenet, where the domain shift is small, CHEF is competitive with state-of-the-art methods. On cross-domain few-shot benchmark challenges with larger domain shifts, CHEF establishes novel state-of-the-art results in all categories. We further apply CHEF on a real-world cross-domain application in drug discovery. We consider a domain shift from bioactive molecules to environmental chemicals and drugs with twelve associated toxicity prediction tasks. On these tasks, that are highly relevant for computational drug discovery, CHEF significantly outperforms all its competitors.

1. INTRODUCTION

Currently, deep learning is criticized because it is data hungry, has limited capacity for transfer, insufficiently integrates prior knowledge, and presumes a largely stable world (Marcus, 2018) . In particular, these problems appear after a domain shift, that is, a change of the input-target distribution. A domain shift forces deep learning models to adapt. The goal is to exploit models that were trained on the typically rich original data for solving tasks from the new domain with much less data. Examples for domain shifts are new users or customers, new products and product lines, new diseases (e.g. adapting from SARS to COVID19), new images from another field (e.g. from cats to dogs or from cats to bicycles), new social behaviors after societal change (e.g. introduction of cell phones, pandemic), self-driving cars in new cities or countries (e.g. from European countries to Arabic countries), and robot manipulation of new objects. Domain shifts are often tackled by meta-learning (Schmidhuber, 1987; Bengio et al., 1990; Hochreiter et al., 2001) , since it exploits already acquired knowledge to adapt to new data. One prominent application of meta-learning dealing with domain shifts is few-shot learning, since, typically, from the new domain much less data is available than from the original domain. Meta-learning methods perform well on small domain shifts like new target classes with similar inputs. However, larger domain shifts are still challenging for current approaches. Large domain shifts lead to inputs, which are considerably different from the original inputs and possess different high-level concepts. Nonetheless, low-level concepts are often still shared between the inputs of the original domain and the inputs of the new domain. For images, such shared low-level concepts can be edges, textures, small shapes, etc. One way of obtaining low level concepts is to train a new deep learning model from scratch, where the new data is merged with the original data. However, although models of the original domain are often available, the original data, which the models were trained on, often are not. This might have several reasons, e.g. the data owner does no longer grant access to the data, General Data Protection Regulation (GDPR) does no longer allow access to the data, IP restrictions prevent access to the data, sensitive data items must not be touched anymore (e.g. phase III drug candidates), or data is difficult to extract again. We therefore suggest to effectively exploit original data models directly by accessing not only high level but also low level abstractions. In this context, we propose a cross-domain few-shot learning method extracting information from different levels of abstraction in a deep neural network. Representation fusion. Deep Learning constructs neural network models that represent the data at multiple levels of abstraction (LeCun et al., 2015) . We introduce representation fusion, which is the concept of unifying and merging information from different levels of abstraction. Representation fusion uses a fast and adaptive system for detecting relevant information at different abstraction levels of a deep neural network, which we will show allows solving versatile and complex cross-domain tasks. CHEF. We propose cross-domain ensemble few-shot learning (CHEF) that achieves representation fusion by an ensemble of Hebbian learners, which are built upon a trained network. CHEF naturally addresses the problem of domain shifts which occur in a wide range of real-world applications. Furthermore, since CHEF only builds on representation fusion, it can adapt to new characteristics of tasks like unbalanced data sets, classes with few examples, change of the measurement method, new measurements in unseen ranges, new kind of labeling errors, and more. The usage of simple Hebbian learners allows the application of CHEF without needing to backpropagate information through the backbone network. The main contributions of this paper are: • We introduce representation fusion as the concept of unifying and merging information from different layers of abstraction. • We introduce CHEFfoot_0 as our new cross-domain few-shot learning method that builds on representation fusion. We show that using different layers of abstraction allows one to successfully tackle various few-shot learning tasks across a wide range of different domains. CHEF does not need to backpropagate information through the backbone network. • We apply CHEF to various cross-domain few-shot tasks and obtain several state-of-theart results. We further apply CHEF to cross-domain real-world applications from drug discovery, where we outperform all competitors. Related work. Representation fusion builds on learning a meaningful representation (Bengio et al., 2013; Girshick et al., 2014) at multiple levels of abstraction (LeCun et al., 2015; Schmidhuber, 2015) . The concept of using representations from different layers of abstraction has been used in CNN architectures (LeCun et al., 1998) , 2019; Wouter, 2018; Webb et al., 2018; Gama et al., 2014; Widmer & Kubat, 1996) . Domain adaptation (Pan & Yang, 2009; Ben-David et al., 2010) overcomes this problem by e.g. reweighting the original samples (Jiayuan et al., 2007) , learning features that are invariant to a domain shift (Ganin et al., 2016; Xu et al., 2019) or learning a classifier in the new domain. Domain adaptation where only few data is available in the new domain (Ben-David et al., 2010; Lu et al., 2020) is called cross-domain few-shot learning (Guo et al., 2019; Lu et al., 2020; Tseng et al., 2020) , which is an instance of the general few-shot learning setting (Fei-Fei et al., 2006) . Few-shot learning can be roughly divided into three approaches (Lu et al., 2020; Hospedales et al., 2020) : (i) augmentation, (ii) metric learning, and (iii) meta-learning. For (i), where the idea is to learn an augmentation to produce more than the few samples available, supervised (Dixit et al., 2017; Kwitt et al., 2016) and unsupervised (Hariharan & Girshick, 2017; Pahde et al., 2019; Gao et al., 2018) methods are considered. For (ii), approaches aim to learn a pairwise similarity metric under which similar samples obtain high similarity scores (Koch et al., 2015; Ye & Guo, 2018; Hertz et al., 2006) . For (iii), methods comprise embedding and nearest-neighbor approaches (Snell et al., 2017b;  



Our implementation is available at github.com/tomte812/chef.



such as Huang et al. (2017); Rumetshofer et al. (2018); Hofmarcher et al. (2019), in CNNs for semantic segmentation in the form of multi-scale context pooling (Yu & Koltun, 2015; Chen et al., 2018), and in the form of context capturing and symmetric upsampling (Ronneberger et al., 2015). Learning representations from different domains has been explored by Federici et al. (2020); Tschannen et al. (2020) under the viewpoint of mutual information optimization. Work on domain shifts discusses the problem that new inputs are considerably different from the original inputs (Kouw & Loog

