MODEL-CENTRIC DATA MANIFOLD: THE DATA THROUGH THE EYES OF THE MODEL

Abstract

We discover that deep ReLU neural network classifiers can see a low-dimensional Riemannian manifold structure on data. Such structure comes via the local data matrix, a variation of the Fisher information matrix, where the role of the model parameters is taken by the data variables. We obtain a foliation of the data domain and we show that the dataset on which the model is trained lies on a leaf, the data leaf, whose dimension is bounded by the number of classification labels. We validate our results with some experiments with the MNIST dataset: paths on the data leaf connect valid images, while other leaves cover noisy images.

1. INTRODUCTION

In machine learning, models are categorized as discriminative models or generative models. From its inception, deep learning has focused on classification and discriminative models (Krizhevsky et al., 2012; Hinton et al., 2012; Collobert et al., 2011) . Another perspective came with the construction of generative models based on neural networks (Kingma & Welling, 2014; Goodfellow et al., 2014; Van den Oord et al., 2016; Kingma & Dhariwal, 2018) . Both kinds of models give us information about the data and the similarity between examples. In particular, generative models introduce a geometric structure on generated data. Such models transform a random low-dimensional vector to an example sampled from a probability distribution approximating the one of the training dataset. As proved by Arjovsky & Bottou (2017) , generated data lie on a countable union of manifolds. This fact supports the human intuition that data have a low-dimensional manifold structure, but in generative models the dimension of such a manifold is usually a hyper-parameter fixed by the experimenter. A recent algorithm by Peebles et al. (2020) provides a way to find an approximation of the number of dimensions of the data manifold, deactivating irrelevant dimensions in a GAN. Similarly, here we try to understand if a discriminative model can be used to detect a manifold structure on the space containing data and to provide tools to navigate this manifold. The implicit definition of such a manifold and the possibility to trace paths between points on the manifold can open many possible applications. In particular, we could use paths to define a system of coordinates on the manifold (more specifically on a chart of the manifold). Such coordinates would immediately give us a low-dimensional parametrization of our data, allowing us to do dimensionality reduction. In supervised learning, a model is trained on a labeled dataset to identify the correct label on unseen data. A trained neural network classifier builds a hierarchy of representations that encodes increasingly complex features of the input data (Olah et al., 2017) . Through the representation function, a distance (e.g. euclidean or cosine) on the representation space of a layer endows input data with a distance. This pyramid of distances on examples is increasingly class-aware: the deeper is the layer, the better the metric reflects the similarity of data according to the task at hand. This observation suggests that the model is implicitly organizing the data according to a suitable structure. Unfortunately, these intermediate representations and metrics are insufficient to understand the geometric structure of data. First of all, representation functions are not invertible, so we cannot recover the original example from its intermediate representation or interpolate between data points. Moreover, the domain of representation functions is the entire data domain R n . This domain is mostly composed of meaningless noise and data occupy only a thin region inside of it. So, even if representation functions provide us a distance, those metrics are incapable of distinguishing between meaningful data and noise. 1

