UNIFORM PRIORS FOR DATA-EFFICIENT TRANSFER

Abstract

Deep Neural Networks have shown great promise on a variety of downstream applications; but their ability to adapt and generalize to new data and tasks remains a challenge. However, the ability to perform few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore crucial to understand what makes for good, transferable features in deep networks that best allow for such adaptation. In this paper, we shed light on this by showing that features that are most transferable have high uniformity in the embedding space and propose a uniformity regularization scheme that encourages better transfer and feature reuse. We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data, for which we conduct a thorough experimental study covering four relevant, and distinct domains: few-shot Meta-Learning, Deep Metric Learning, Zero-Shot Domain Adaptation, as well as Outof-Distribution classification. Across all experiments, we show that uniformity regularization consistently offers benefits over baseline methods and is able to achieve state-of-the-art performance in Deep Metric Learning and Meta-Learning.

1. INTRODUCTION

Deep Neural Networks have enabled great success in various machine learning domains such as computer vision (Girshick, 2015; He et al., 2016; Long et al., 2015) , natural language processing (Vaswani et al., 2017; Devlin et al., 2018; Brown et al., 2020 ), decision making (Schulman et al., 2015; 2017; Fujimoto et al., 2018) or in medical applications (Ronneberger et al., 2015; Hesamian et al., 2019) . This can be largely attributed to the ability of networks to extract abstract features from data, which, given sufficient data, can effectively generalize to held-out test sets. However, the degree of generalization scales with the semantic difference between test and training tasks, caused e.g. by domain or distributional shifts between training and test data. Understanding how to achieve generalization under such shifts is an active area of research in fields like Meta-Learning (Snell et al., 2017; Finn et al., 2017; Chen et al., 2020) , Deep Metric Learning (DML) (Roth et al., 2020b; Hadsell et al., 2006) , Zero-Shot Domain Adaptation (ZSDA) (Tzeng et al., 2017; Kodirov et al., 2015) or low-level vision tasks (Tang et al., 2020) . In the few-shot Meta-Learning setting, a meta-learner is tasked to quickly adapt to novel test data given its training experience and a limited labeled data budget; similarly fields like DML and ZSDA study generalization at the limit of such adaptation, where predictions on novel test data are made without any test-time finetuning. Yet, despite the motivational differences, each of these fields require representations to be learned from the training data that allow for better generalization and adaptation to novel tasks and data. Although there exists a large corpus of domain-specific training methods, in this paper we seek to investigate what fundamental properties learned features and feature spaces should have to facilitate such generalization. Fortunately, recent literature provides pointers towards one such property: the notion of "feature uniformity" for improved generalization. For Unsupervised Representation Learning, Wang & Isola (2020) highlight a link between the uniform distribution of hyperspherical feature representations and the transfer performance in downstream tasks, which has been implicitly adapted in the design of modern contrastive learning methods (Bachman et al., 2019; Tian et al., 2020a; b) . Similarly, Roth et al. (2020b) show that for Deep Metric Learning, uniformity in hyperspherical embedding space coverage as well as uniform singular value distribution embedding spaces are strongly connected to zero-shot generalization performance. Both Wang & Isola (2020) and Roth et al. (2020b) link the uniformity in the feature representation space to the preservation of maximal information 1

