META-GMVAE: MIXTURE OF GAUSSIAN VAES FOR UNSUPERVISED META-LEARNING

Abstract

Unsupervised learning aims to learn meaningful representations from unlabeled data which can capture its intrinsic structure, that can be transferred to downstream tasks. Meta-learning, whose objective is to learn to generalize across tasks such that the learned model can rapidly adapt to a novel task, shares the spirit of unsupervised learning in that the both seek to learn more effective and efficient learning procedure than learning from scratch. The fundamental difference of the two is that the most meta-learning approaches are supervised, assuming full access to the labels. However, acquiring labeled dataset for meta-training not only is costly as it requires human efforts in labeling but also limits its applications to pre-defined task distributions. In this paper, we propose a principled unsupervised meta-learning model, namely Meta-GMVAE, based on Variational Autoencoder (VAE) and set-level variational inference. Moreover, we introduce a mixture of Gaussian (GMM) prior, assuming that each modality represents each class-concept in a randomly sampled episode, which we optimize with Expectation-Maximization (EM). Then, the learned model can be used for downstream few-shot classification tasks, where we obtain task-specific parameters by performing semi-supervised EM on the latent representations of the support and query set, and predict labels of the query set by computing aggregated posteriors. We validate our model on Omniglot and Mini-ImageNet datasets by evaluating its performance on downstream few-shot classification tasks. The results show that our model obtains impressive performance gains over existing unsupervised metalearning baselines, even outperforming supervised MAML on a certain setting.

1. INTRODUCTION

Unsupervised learning is one of the most fundamental and challenging problems in machine learning, due to the absence of target labels to guide the learning process. Thanks to the enormous research efforts, there now exist many unsupervised learning methods that have shown promising results on real-world domains, including image recognition (Le, 2013) and natural language understanding (Ramachandran et al., 2017) . The essential goal of unsupervised learning is obtaining meaningful feature representations that best characterize the data, which can be later utilized to improve the performance of the downstream tasks, by training a supervised task-specific model on the top of the learned representations (Reed et al., 2014; Cheung et al., 2015; Chen et al., 2016) or fine-tuning the entire pre-trained models (Erhan et al., 2010) . Meta-learning, whose objective is to learn general knowledge across diverse tasks, such that the learned model can rapidly adapt to novel tasks, shares the spirit of unsupervised learning in that both seek more efficient and effective learning procedure over learning from scratch. However, the essential difference between the two is that most meta-learning approaches have been built on the supervised learning scheme, and require human-crafted task distributions to be applied in fewshot classification. Acquiring labeled dataset for meta-training may require a massive amount of human efforts, and more importantly, meta-learning limits its applications to the pre-defined task distributions (e.g. classification of specific set of classes). Two recent works have proposed unsupervised meta-learning that can bridge the gap between unsupervised learning and meta-learning by focusing on constructing supervised tasks with pseudo-labels from the unlabeled data. To do so, CACTUs (Hsu et al., 2019) clusters data in the embedding space (Cubuk et al., 2018) . After constructing the meta-training dataset with such heuristics, they simply apply supervised meta-learning algorithms as usual. Despite the success of the existing unsupervised meta-learning methods, they are fundamentally limited, since 1) they only consider unsupervised learning for heuristic pseudo-labeling of unlabeled data, and 2) the two-stage approach makes it impossible to recover from incorrect pseudo-class assignment when learning the unsupervised representation space. In this paper, we propose a principled unsupervised meta-learning model based on Variational Autoencoder (VAE) (Kingma & Welling, 2014) and set-level variational inference using self-attention (Vaswani et al., 2017) . Moreover, we introduce multi-modal prior distributions, a mixture of Gaussians (GMM), assuming that each modality represents each class-concept in any given tasks. Then the parameter of GMM is optimized by running Expectation-Maximization (EM) on the observations sampled from the set-dependent variational posterior. In this framework, however, there is no guarantee that each modality obtained from EM algorithm corresponds to a label. To realize modality as label, we deploy semi-supervised EM at meta-test time, considering the support set and query set as labeled and unlabeled observations, respectively. We refer to our method as Meta-Gaussian Mixture Variational Autoencoders (Meta-GMVAE) (See Figure 1 for high-level concept). While our method can be used as a full generative model for generating the samples (images), the ability to generalize to generate samples may not be necessary for capturing the meta-knowledge for non-generative downstream tasks. Thus, we propose another version of Meta-GMVAE that reconstructs high-level features learned by unsupervised representation learning approaches (e.g. Chen et al. ( 2020)). To investigate the effectiveness of our framework, we run experiments on two benchmark fewshot image classification datasets, namely Omiglot (Lake et al., 2011) and Mini-Imagenet (Ravi & Larochelle, 2017) . The experimental results show that our Meta-GMVAE obtains impressive performance gains over the relevant unsupervised meta-learning baselines on both datasets, obtaining even better accuracy than fully supervised MAML (Finn et al., 2017) while utilizing as small as 0.1% of the labeled data on one-shot settings in Omniglot dataset. Moreover, our model can generalize to classification tasks with different number of ways (classes) without loss of accuracy. Our contribution is threefold: • We propose a novel unsupervised meta-learning model, namely Meta-GMVAE, which metalearns the set-conditioned prior and posterior network for a VAE. Our Meta-GMVAE is a principled unsupervised meta-learning method, unlike existing methods on unsupervised meta-learning that combines heuristic pseudo-labeling with supervised meta-learning. • We propose to learn the multi-modal structure of a given dataset with the Gaussian mixture prior, such that it can adapt to a novel dataset via the EM algorithm. This flexible adaptation to a new task, is not possible with existing methods that propose VAEs with Gaussian mixture priors for single task learning. • We show that Meta-GMVAE largely outperforms relevant unsupervised meta-learning baselines on two benchmark datasets, while obtaining even better performance than a supervised metalearning model under a specific setting. We further show that Meta-GMVAE can generalize to classification tasks with different number of ways (classes).



Figure 1: During meta-training, Meta-GMVAE learns multi-modal latent space that can best explain the unlabeled data using EM algorithm. At meta-test time, we use semi-supervised EM to map both the support (labeled data) and queries (unlabeled data) to each mode learned during meta-training. learned with several unsupervised learning methods, while UMTRA (Khodadadeh et al., 2019) assumed that each randomly drawn sample represents a different class and augmented each pseudoclass with data augmentation (Cubuket al., 2018). After constructing the meta-training dataset with such heuristics, they simply apply supervised meta-learning algorithms as usual. Despite the success of the existing unsupervised meta-learning methods, they are fundamentally limited, since 1) they only consider unsupervised learning for heuristic pseudo-labeling of unlabeled data, and 2) the two-stage approach makes it impossible to recover from incorrect pseudo-class assignment when learning the unsupervised representation space.

