VERIFYING THE UNION OF MANIFOLDS HYPOTHESIS FOR IMAGE DATA

Abstract

Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in image data. Assuming that data lies on a single manifold implies intrinsic dimension is identical across the entire data space, and does not allow for subregions of this space to have a different number of factors of variation. To address this deficiency, we consider the union of manifolds hypothesis, which states that data lies on a disjoint union of manifolds of varying intrinsic dimensions. We empirically verify this hypothesis on commonlyused image datasets, finding that indeed, observed data lies on a disconnected set and that intrinsic dimension is not constant. We also provide insights into the implications of the union of manifolds hypothesis in deep learning, both supervised and unsupervised, showing that designing models with an inductive bias for this structure improves performance across classification and generative modelling tasks.

1. INTRODUCTION

The manifold hypothesis (Bengio et al., 2013) states that high-dimensional data of interest often lives in an unknown lower-dimensional manifold embedded in ambient space, and there is strong evidence supporting this hypothesis. From a theoretical perspective, it is known that both manifold learning and density estimation scale exponentially with the (low) intrinsic dimension when such structure exists (Ozakin & Gray, 2009; Narayanan & Mitter, 2010) , while scaling exponentially with the (high) ambient dimension otherwise (Cacoullos, 1966) . Thus, the most plausible explanation for the success of machine learning methods on high-dimensional data is the existence of far lower intrinsic dimension, which facilitates learning on datasets of fairly reasonable size. This is verified empirically by Pope et al. (2021) , in which a comprehensive study estimating the intrinsic dimension of commonly-used image datasets is performed, clearly finding low-dimensional structure. However, thinking of observed data as lying on a single unknown low-dimensional manifold is quite limiting, as this implies that the intrinsic dimension throughout the dataset is constant. If we consider the intrinsic dimensionality to be the number of factors of variation generating the data, we can see that this formulation prevents distinct regions of the data's support from having differing quantities of factors of variation. Yet this seems to be unrealistic: for example, we should not expect the number of factors needed to describe 8s and 1s in the MNIST dataset (LeCun et al., 1998) to be equal. To accommodate this intuition, in this paper we consider the union of manifolds hypothesis: that high-dimensional image data often lies not on a single manifold, but on a disjoint union of manifolds * Work done during an internship at Layer 6 AI. 1

