CONTRASTIVE META-LEARNING FOR PARTIALLY OBSERVABLE FEW-SHOT LEARNING

Abstract

Many contrastive and meta-learning approaches learn representations by identifying common features in multiple views. However, the formalism for these approaches generally assumes features to be shared across views to be captured coherently. We consider the problem of learning a unified representation from partial observations, where useful features may be present in only some of the views. We approach this through a probabilistic formalism enabling views to map to representations with different levels of uncertainty in different components; these views can then be integrated with one another through marginalisation over that uncertainty. Our approach, Partial Observation Experts Modelling (POEM), then enables us to meta-learn consistent representations from partial observations. We evaluate our approach on an adaptation of a comprehensive few-shot learning benchmark, Meta-Dataset, and demonstrate the benefits of POEM over other meta-learning methods at representation learning from partial observations. We further demonstrate the utility of POEM by meta-learning to represent an environment from partial views observed by an agent exploring the environment. 1

Minimise Distance

Maximise Consistency This encourages the learning of features that are consistent in all views; in the above example this corresponds to the pattern on the bird's wing. To better handle partial observability, where features may be disjoint between views, we propose Partial Observation Experts Modelling (POEM). POEM instead maximises consistency between multiple views, by utilising representation uncertainty to learn which features of the entity are captured by a view, and then combining these representations together by weighting features by their uncertainty via a product of experts model (Hinton, 2002) .

1. INTRODUCTION

Modern contrastive learning methods (Radford et al., 2021; Chen et al., 2020; He et al., 2020; Oord et al., 2019) , and embedding-based meta-learning methods such as Prototypical Networks (Snell et al., 2017; Vinyals et al., 2016; Sung et al., 2018; Edwards & Storkey, 2017) , learn representations by minimizing a relative distance between representations of related items compared with unrelated items (Ericsson et al., 2021) . However, we argue that these approaches may learn to disregard potentially relevant features from views that only inform part of the representation in order to achieve better representational consistency, as demonstrated in Figure 1 . We refer to such partially informative views as partial observations. The difficulty with partial observations occurs because distances computed between representations must include contributions from all parts of the representation vector. If the views provided are diverse, and therefore contain partially disjoint features, their representations may appear different to a naive distance metric. For example, two puzzle pieces may contain different information about the whole picture. We call this the problem of integrative representation learning, where we wish to obtain a representation that integrates different but overlapping information from each element of a set. In this paper, we provide a probabilistic formalism for a few-shot objective that is able to learn to capture representations in partially observable settings. It does so by building on a product of experts (Hinton, 2002) to utilise representation uncertainty: a high variance in a representation component indicates that the given view of the data poorly informs the given component, while low variance indicates it informs it well. Given multiple views of the data, the product of experts component in POEM combines the representations, weighting by the variance, to get a maximally informative and consistent representation from the views. To comprehensively evaluate our approach, we adapt a large-scale few-shot learning benchmark, Meta-Dataset (Triantafillou et al., 2020) , to evaluate representation learning from partial observations. We demonstrate that our approach, Partial Observation Experts Modelling (POEM), is able to outperform standard few-shot baselines on our adapted benchmark, Partially Observed Meta-Dataset (PO-Meta-Dataset), while still matching state-of-the-art on the standard benchmark. Finally, we demonstrate the potential for our approach to be applied to meta-learn representations of environments from the partial views observed by an agent exploring that environment. The main contributions of this work are: 1) A probabilistic formalism, POEM, that enables representation learning under partial observability; 2) Comprehensive experimental evaluation of POEM on an adaptation of Meta-Dataset designed to evaluate representation learning under partial observability, demonstrating that this approach outperforms standard baselines in this setting while still matching state-of-the-art on the standard fully observed benchmark; 3) A demonstration of a potential application of POEM to meta-learn representations of environments from partial observations.

2.1. CONTRASTIVE LEARNING

Contrastive learning extracts features that are present in multiple views of a data item, by encouraging representations of related views to be close in an embedding space (Ericsson et al., 2021) . In computer vision and natural language applications these views typically consist of different augmentations of data items, which are carefully crafted to preserve semantic features, and thereby act as an inductive bias to encourage the contrastive learner to retain these consistent features (Le-Khac et al., 2020) . A challenge in this approach is to prevent representational 'collapse', where all views are mapped to the same representation. Standard contrastive approaches such as Contrastive Predictive Coding (Oord et al., 2019 ), MoCo (He et al., 2020 ), and SimCLR (Chen et al., 2020) handle this by computing feature space distance measures relative to the distances for negative views -pairs of views that are encouraged to be distinct in the embedding space. In this work we take a similar approach, where the negative views are partial observations of distinct items, but we aim to learn to unify features from differing views, not just retain the consistent features. We learn to learn a contrastive representation from partial views. We note that state-of-the-art representation learning approaches such as CLIP (Radford et al., 2021) , which leverage contrastive learning across modalities, also suffer from extracting only a limited subset of features (Fürst et al., 2022) due to using an embedding-based approach (Vinyals et al., 2016) to match image and text representations.

2.2. EMBEDDING-BASED META-LEARNING

Embedding-based meta-learners similarly learn representations of classes by extracting features that are consistently present in the data samples (generally referred to as shots in the meta-learning literature) provided for each class, such that the class of new samples can be identified with a similarity



Implementation code is available at https://github.com/AdamJelley/POEM



Figure 1: Standard contrastive (meta-) learners minimise a relative distance between representations.This encourages the learning of features that are consistent in all views; in the above example this corresponds to the pattern on the bird's wing. To better handle partial observability, where features may be disjoint between views, we propose Partial Observation Experts Modelling (POEM). POEM instead maximises consistency between multiple views, by utilising representation uncertainty to learn which features of the entity are captured by a view, and then combining these representations together by weighting features by their uncertainty via a product of experts model(Hinton, 2002).

