TIGHT NON-ASYMPTOTIC INFERENCE VIA SUB-GAUSSIAN IN-TRINSIC MOMENT NORM

Abstract

In non-asymptotic statistical inferences, variance-type parameters of sub-Gaussian distributions play a crucial role. However, direct estimation of these parameters based on the empirical moment generating function (MGF) is infeasible. To this end, we recommend using a sub-Gaussian intrinsic moment norm [Buldygin and Kozachenko (2000), Theorem 1.3] through maximizing a series of normalized moments. Importantly, the recommended norm can not only recover the exponential moment bounds for the corresponding MGFs, but also lead to tighter Hoeffiding's sub-Gaussian concentration inequalities. In practice, we propose an intuitive way of checking sub-Gaussian data with a finite sample size by the sub-Gaussian plot. Intrinsic moment norm can be robustly estimated via a simple plug-in approach. Our theoretical results are applied to non-asymptotic analysis, including the multi-armed bandit.

1. INTRODUCTION

With the advancement of machine learning techniques, computer scientists have become more interested in establishing rigorous error bounds for desired learning procedures, especially those with finite sample validity (Wainwright, 2019; Zhang & Chen, 2021; Yang et al., 2020) . In specific settings, statisticians, econometricians, engineers and physicist have developed non-asymptotic inferences to quantify uncertainty in data; see Romano & Wolf (2000) ; Chassang 2013); Wang (2020). Therefore, the concentration-based statistical inference has received a considerable amount of attention, especially for bounded data (Romano & Wolf, 2000; Auer et al., 2002; Hao et al., 2019; Wang et al., 2021; Shiu, 2022) and Gaussian data (Arlot et al., 2010; Duy & Takeuchi, 2022; Bettache et al., 2021; Feng et al., 2021) . For example, Hoeffding's inequality can be applied to construct nonasymptotic confidence intervals based on bounded datafoot_0 . However, in reality, it may be hard to know the support of data or its underlying distribution. In this case, misusing Hoeffding's inequality (Hoeffding, 1963) for unbounded data will result in a notably loose confidence interval (CI); see Appendix A.1. Hence, it is a common practice to assume that data follow sub-Gaussian distribution (Kahane, 1960) . By the Chernoff inequalityfoot_1 , we have P(X ≥ t) ≤ inf s>0 exp{-st}E exp{sX} , ∀ t ≥ 0. Hence, tightness of a confidence interval relies on how we upper bound the moment generating function (MGF) E exp{sX} for all s > 0. This can be further translated into the following optimal variance proxy of sub-Gaussian distribution. Definition 1. A r.v. X is sub-Gaussian (sub-G) with a variance proxy σ 2 [denoted as X ∼ subG(σ 2 )] if its MGF satisfies E exp(tX) ≤ exp(σ 2 t 2 /2) for all t ∈ R. The sub-Gaussian parameter σ opt (X) is defined by the optimal variance proxy (Chow, 1966): σ 2 opt (X) := inf σ 2 > 0 : E exp(tX) ≤ exp{σ 2 t 2 /2}, ∀ t ∈ R = 2 sup t∈R t -2 log[E exp(tX)]. Note that σ 2 opt (X) ≥ Var X; see (14) in Appendix A.2. When σ 2 opt (X) = Var X, it is called strict sub-Gaussianity (Arbel et al., 2020) . Based on Theorems 1.5 in Buldygin & Kozachenko (2000) , we have P (X ≥ t) ≤ exp - t 2 2σ 2 opt (X) , P | n i=1 X i | ≥ t ≤ 2 exp - t 2 2 n i=1 σ 2 opt (X i ) . (2)



Recently, Phan et al. (2021) obtained a sharper result than Hoeffding's inequality for bounded data. For simplicity, we consider centered random variable (r.v.) with zero mean throughout the paper for all sub-Gaussian r.v.. 1



(2009); Arlot et al. (2010); Yang et al. (2020); Horowitz & Lee (2020); Armstrong & Kolesár (2021); Zheng & Cheng (2021); Lucas et al. (2008); Owhadi et al. (

