THE INTRINSIC DIMENSION OF IMAGES AND ITS IMPACT ON LEARNING

Abstract

It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data. Along the way, we develop a technique for validating our dimension estimation tools on synthetic data generated by GANs allowing us to actively manipulate the intrinsic dimension by controlling the image generation process. Code for our experiments may be found here.

1. INTRODUCTION

The idea that real-world data distributions can be described by very few variables underpins machine learning research from manifold learning to dimension reduction (Besold & Spokoiny, 2019; Fodor, 2002) . The number of variables needed to describe a data distribution is known as its intrinsic dimension (ID). In applications, such as crystallography, computer graphics, and ecology, practitioners depend on data having low intrinsic dimension (Valle & Oganov, 2010; Desbrun et al., 2002; Laughlin, 2014) . The utility of representations which are low-dimensional has motivated a variety of deep learning techniques including autoencoders and regularization methods (Hinton & Salakhutdinov, 2006; Vincent et al., 2010; Gonzalez & Balajewicz, 2018; Zhu et al., 2018) . It is also known that dimensionality plays a strong role in learning function approximations and non-linear class boundaries. The exponential cost of learning in high dimensions is easily captured by the trivial case of sampling a function on a cube; in d dimensions, sampling only the cube vertices would require 2 d measurements. Similar behaviors emerge in learning theory. It is known that learning a manifold requires a number of samples that grows exponentially with the manifold's intrinsic dimension (Narayanan & Mitter, 2010) . Similarly, the number of samples needed to learn a well-conditioned decision boundary between two classes is an exponential function of the intrinsic dimension of the manifold on which the classes lie (Narayanan & Niyogi, 2009) . Furthermore, these learning bounds have no dependence on the ambient dimension in which manifold-structured datasets live. In light of the exponentially large sample complexity of learning high-dimensional functions, the ability of neural networks to learn from image data is remarkable. Networks learn complex decision boundaries from small amounts of image data (often just a few hundred or thousand samples per class). At the same time, generative adversarial networks (GANs) are able to learn image "manifolds" from merely a few thousand samples. The seemingly low number of samples needed to learn these manifolds strongly suggests that image datasets have extremely low-dimensional structure. Despite the established role of low dimensional data in deep learning, little is known about the intrinsic dimension of popular datasets and the impact of dimensionality on the performance of neural networks. Computational methods for estimating intrinsic dimension enable these measurements.

