GENERALIZATION BOUNDS WITH ARBITRARY COMPLEXITY MEASURES

Abstract

In statistical learning theory, generalization bounds usually involve a complexity measure that is determined by the considered theoretical framework. This limits the scope of such analyses, as other forms of capacity measures or regularization are used in practical algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayesian bounds and combine it with Gibbs distributions to derive generalization bounds involving a complexity measure that can be defined by the user. Our bounds stand in probability jointly over the hypotheses and the learning sample, which allows us to tighten the complexity for a given generalization gap since it can be set to fit both the hypothesis class and the task.

1. INTRODUCTION

Statistical learning theory offers various theoretical frameworks to assess generalization by studying whether the empirical risk is representative of the true risk thanks to an upper bounding strategy of the generalization gap. The generalization gap is a deviation between the true risk and the empirical risk. An upper bound on this gap is generally a function of two main quantities: (i) the size of the training sample and (ii) a complexity measure that captures how prone a model is to overfitting. One potential limitation is that existing frameworks are restricted to particular complexity measures, among them the VC-dimension (Vapnik & Chervonenkis, 1971) or the Rademacher complexity (Bartlett & Mendelson, 2002) for which some generalization bounds can be derived. To the best of our knowledge, there is no generalization bound able to take into account, by construction, some arbitrary complexity measures that can serve as good proxies for the generalization gap. In this paper, we tackle this drawback by leveraging the framework of disintegrated PAC-Bayesian bound (Theorem 2.1) to propose a novel generalization bound with arbitrary complexity measures. To do so, we make use of the Gibbs probability distributions (Equation ( 2)) that depend on a user-defined parametric function characterizing the complexity. It allows us to derive guarantees in terms of probabilistic bounds that depend on a model sampled from a Gibbs distribution mentioned above. It is worth noticing that our result allows retrieving the uniform convergence and algorithm-dependent bounds. We believe that our novel result provides theoretical foundations for the many regularizations used in practice to perform model selection. For instance, our result allows integrating complexity measures studied empirically in a recent line of work on over-parametrized models (Jiang et al., 2019; Dziugaite et al., 2020; Jiang et al., 2021) . In our experimental evaluation, we show how these measures can be easily integrated into our framework in practice. We notably provide a stochastic version of the Metropolis Adjusted Langevin algorithm to compute empirical estimates of our bounds. Organization of the paper. In Section 2, we provide some preliminary definitions and concepts. Then, we present our main contribution in Section 3. In Section 4, we provide a practical instantiation of our framework before concluding in Section 5.

