SLICE, DICE, AND OPTIMIZE: MEASURING THE DI-MENSION OF NEURAL NETWORK CLASS MANIFOLDS

Abstract

Deep neural network classifiers naturally partition input space into regions belonging to different classes. The geometry of these class manifolds (CMs) is widely studied and is intimately related to model performance; for example, the margin is defined via boundaries between these CMs. We present a simple technique to estimate the effective dimension of CMs as well as boundaries between multiple CMs, by computing their intersection with random affine subspaces of varying dimension. We provide a theory for the technique and verify that our theoretical predictions agree with measurements on real neural networks. Through extensive experiments, we leverage this method to show deep connections between the geometry of CMs, generalization, and robustness. In particular we investigate how CM dimension depends on 1) the dataset, 2) architecture, 3) random initialization, 4) stage of training, 5) class, 6) ensemble size, 7) label randomization, 8) training set size, and 9) model robustness to data corruption. Together a picture emerges that well-performing, robust models have higher dimensional CMs than worse performing models. Moreover, we offer a unique perspective on ensembling via intersections of CMs. Our core code is available on Github.

1. INTRODUCTION

Training neural networks to classify data is a ubiquitous and classic problem in deep learning. In Kway classification, trained networks naturally partition the space of inputs into K regions, S k ⊂ R D , We use optimization from a random point (image) X 0 on the d cut affine subspace to find a point in the intersection using gradient descent. The panels on the right show an example of the dependence of the probability and loss at the optimized point based on the d cut . The higher dimensional the cut, the less constrained the available images X are, and the more likely we are to find one of high class confidence. containing points that the network confidently predict have class k. We call these regions class manifolds (CMs) of the neural network. In this paper, we analyze the high-dimensional geometry of these CMs, focusing primarily on their effective dimensionality. To estimate the dimension of these CMs, we employ optimization on random d-dimensional sections of inputs space to beat the curse of dimensionality (Bellman, 1957) in order to seek out highconfidence regions that would be unlikely to be discovered at random with other diagnostic techniques. Through a theoretical analysis of high-dimensional geometry we link the success of such constrained optimization to the dimension of the target CM. Also, through extensive experiments, we leverage this method to show deep connections between the geometry of CMs, generalization, and robustness. In particular we investigate how CM dimension depends on 1) the dataset, 2) architecture, 3) random initialization, 4) stage of training, 5) class, 6) ensemble size, 7) training set size, and 8) model robustness to data corruption. Together a picture emerges that well-performing, robust, models have CMs that have higher dimension than inferior models. Moreover, we offer a unique perspective on ensembling via intersections of CMs. 



Figure 1: An illustration of finding a point in the intersection between a random cutting plane of dimension d cut and a high-confidence manifold of effective dimension d manifold . If the d cut D input -d manifold , there likely exists an intersection between the two.We use optimization from a random point (image) X 0 on the d cut affine subspace to find a point in the intersection using gradient descent. The panels on the right show an example of the dependence of the probability and loss at the optimized point based on the d cut . The higher dimensional the cut, the less constrained the available images X are, and the more likely we are to find one of high class confidence.

Figure 2: Maximum probability of single classes of CIFAR-10 reached on cutting planes of dimension d. The figure shows the dependence of the probability of a single class of CIFAR-10 (y-axes) reached on random cutting hyperplanes of different dimensions (x-axes). The results shown are for a well-trained (> 90% test accuracy) ResNet20v1 on CIFAR-10. Each dimension d is repeated 10× with random planes and offsets, and d * 50% is extracted using a fit. The d * 50%3072, which implies that the class manifolds are surprisingly high dimensional (3072 -d * 50% ). Indeed their dimensions are all in excess of 3000.

