ON THE IMPORTANCE OF LOOKING AT THE MANIFOLD

Abstract

Data rarely lies on uniquely Euclidean spaces. Even data typically represented in regular domains, such as images, can have a higher level of relational information, either between data samples or even relations within samples, e.g., how the objects in an image are linked. With this perspective our data points can be enriched by explicitly accounting for this connectivity and analyzing them as a graph. Herein, we analyze various approaches for unsupervised representation learning and investigate the importance of considering topological information and its impact when learning representations. We explore a spectrum of models, ranging from uniquely learning representations based on the isolated features of the nodes (focusing on Variational Autoencoders), to uniquely learning representations based on the topology (using node2vec) passing through models that integrate both node features and topological information in a hybrid fashion. For the latter we use Graph Neural Networks, precisely Deep Graph Infomax (DGI), and an extension of the typical formulation of the VAE where the topological structure is accounted for via an explicit regularization of the loss (Graph-Regularized VAEs, introduced in this work). To extensively investigate these methodologies, we consider a wide variety of data types: synthetic data point clouds, MNIST, citation networks, and chemical reactions. We show that each of the representations learned by these models may have critical importance for further downstream tasks, and that accounting for the topological features can greatly improve the modeling capabilities for certain problems. We further provide a framework to analyze these, and future models under different scenarios and types of data.

1. INTRODUCTION

The ability to recognize relational information between or even within individual percepts is one of the fundamental differences between human and artificial learning systems. For example, the feature-binding problem (Roskies, 1999) , i.e. the mechanism governing the visual system to represent hierarchical relationships between features in an image, is still largely unsolved by neuroscientists, exacerbating the development of bio-inspired statistical learning systems. Traditional relational learning approaches mostly sort into either learning internal or external relational structure between samples and rely heavily on crafting domain-specific expert knowledge that is engineered into the model (Struyf & Blockeel, 2010) . Consequently, these models have yet to prove their usability in real applications and, although some neurocomputational frameworks for relational learning were proposed (Isbister et al., 2018) , building statistical models that explore higher-order dependencies between samples remains a key challenge for computer vision and robotics application. Consequently, relational reasoning has been advocated a pivotal role for the future of artificial intelligence (Battaglia et al., 2018) . On the very contrary, deep learning as a purely data-driven approach has enjoyed remarkable success in recent years by learning complex non-linear functions mapping raw inputs to outputs without explicit dependency modelling. Fields like relational reinforcement learning (Džeroski et al., 2001) and statistical relational learning (Koller et al., 2007) aimed to fill this gap; but recently augmenting deep (reinforcement) learning models toward relational reasoning emerged as a promising approach (Zambaldi et al., 2018; Zhang et al., 2016) . Many successful contributions for relational modelling in images however largely rely on Euclidean spaces (Dai et al., 2017; Yao et al., 2018) . It is widely agreed that graphs are the ideal structure to enable relational deep learning (Hamilton et al., 2017) . Prior work has shown that metagraphs incorporating relational information about the dataset can improve unsupervised representation learning in finding less complex models that preserve relational information without loosing expressivity on the original representation (Dumancic & Blockeel, 2017) . In terms of predictive modelling, the relational representations can be superior to ordinary ones (Dumancic & Blockeel, 2017) and graph-induced kernels can aid in improving phenotype prediction compared to non-topological kernels (Manica et al., 2019) . In generative modelling, relational distribution comparison was demonstrated to facilitate the learning of generative models across incomparable spaces (Bunne et al., 2019) . Here, we perform an extensive study on the impact of the topological information in learning data representations. Specifically, we focus on the trade-off between leveraging data point features and relational information. We consider a selection of unsupervised models for learning representations lying in different areas of the spectrum. Ranging from Variational Autoencoders (VAEs) (Kingma & Welling, 2013) to node embedding techniques based on random walks on graphs (Grover & Leskovec, 2016) , passing through graph neural networks (Veličković et al., 2018) and the proposed Graph-Regularized Variational Autoencoders (GR-VAE), our adaptation of VAEs where the latent space is regularized through a metagraph representing relations between samples of the dataset that we introduce in this work. The methods considered are evaluated on different datasets and downstream tasks where the impact of the topology can be appropriately assessed. Initially, we examine the impact of implicitly accounting for the topology to validate the GR-VAE in two synthetic studies based on topologically connected 4D point clouds and MNIST (LeCun et al., 2010) with added relational information based on the labels. After this initial validation, we move to evaluating all the methods in the case of text representations and chemical reactions. In the case of text representations, we analyze the methods performance on Cora, CiteSeer, and PubMed (Sen et al., 2008) , using their citation networks, and evaluating the representations learned in a downstream classification task. Finally, we study the impact of the topology in molecule representations using a chemical reaction dataset (Jin et al., 2017) , where the downstream task consists in predicting the reactivity of reactant/reagent-product pairs.

2. METHODS

In this section we present the different models compared in this study. Our approach is to explore a spectrum of models with varying availability of features and topology (see Figure 1 ).

VAE GR-VAE DGI node2vec

Figure 1 : Topology influence spectrum in the light of the model considered. From left to right we selected the models in order to smoothly transition from a case where only the point/node features are relevant (left, standard VAE) to the opposite end of the spectrum where only the topological properties are considered (right, node2vec). In the middle we find the cases where the point node features and the topology are blended, either implicitly via a regularizer in the GR-VAE case or explicitly in the DGI case.

