EFFECTIVE DIMENSION OF MACHINE LEARNING MODELS

Abstract

Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning, i.e., to understand the generalization power of a model. Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice. In this study, we propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets. Importantly, we prove that the local effective dimension bounds the generalization error and discuss the aptness of this capacity measure for machine learning models.

1. INTRODUCTION

The essence of successful machine learning lies in the creation of a model that is able to learn from data and apply what it has learned to new, unseen data (Goodfellow et al., 2016) . The latter ability is termed the generalization performance of a machine learning model and has proven to be notoriously difficult to predict a priori (Zhang et al., 2021) . The relevance of generalization is rather straightforward: if one already has insight on the performance capability of a model class, this will allow for more robust models to be selected for training and deployment. But how does one begin to analyze generalization without physically training models and assessing their performance on new data thereafter? This age-old question has a rich history and is largely addressed through the notion of capacity. Loosely speaking, the capacity of a model relates to its ability to express a variety of functions (Vapnik et al., 1994) . The higher a model's capacity, the more functions it is able to fit. In the context of generalization, many capacity measures have been shown to mathematically bound the error a model makes when performing a task on new data, i.e. the generalization error (Vapnik & Chervonenkis, 1971; Liang et al., 2019; Bartlett et al., 2017) . Naturally, finding a capacity measure that provides a tight generalization error bound, and in particular, correlates with generalization error across a wide range of experimental setups, will allow us to better understand the generalization performance of machine learning models. Interestingly, through time, proposed capacity measures have differed quite substantially, with tradeoffs apparent among each of the current proposals (Jiang et al., 2019) . The perennial VC dimension has been famously shown to bound the generalization error, but it does not incorporate crucial attributes, such as data potentially coming from a distribution, and ignores the learning algorithm employed which inherently reduces the space of models within a model class that an algorithm has access to (Vapnik et al., 1994) . Arguably, one of the most promising contenders for capacity which attempts to incorporate these factors are norm-based capacity measures, which regularize the margin distribution of a model by a particular norm that usually depends on the model's trained parameters (Bartlett et al., 2017; Neyshabur et al., 2017b; 2015) . Whilst these measures incorporate the distribution of data, as well as the learning algorithm, the drawback is that most depend on the size of the model, which does not necessarily correlate with the generalization error in certain experimental setups (Zhang et al., 2021) . To this end, we present the local effective dimension which attempts to address these issues. By capturing the redundancy of parameters in a model, the local effective dimension is modified from (Berezniuk et al., 2020; Abbas et al., 2021) to incorporate the learning algorithm employed, in addition to being scale invariant and data dependent. The key results from our study can be summarized as follows:

