A GENERALIZED PROBABILITY KERNEL ON DIS-CRETE DISTRIBUTIONS AND ITS APPLICATION IN TWO-SAMPLE TEST

Abstract

We propose a generalized probability kernel(GPK) on discrete distributions with finite support. This probability kernel, defined as kernel between distributions instead of samples, generalizes the existing discrepancy statistics such as maximum mean discrepancy(MMD) as well as probability product kernels, and extends to more general cases. For both existing and newly proposed statistics, we estimate them through empirical frequency and illustrate the strategy to analyze the resulting bias and convergence bounds. We further propose power-MMD, a natural extension of MMD in the framework of GPK, illustrating its usage for the task of two-sample test. Our work connects the fields of discrete distribution-property estimation and kernel-based hypothesis test, which might shed light on more new possibilities.

1. INTRODUCTION

We focus on the two-sample problem, which is given two i.i.d samples {x 1 , x 2 , ...x n } , {y 1 , y 2 , ..., y n }, could we infer the discrepancy between underlying distributions they are drawn from. For such a problem, the option of hypothesis test(two-sample test) is most popular, and a variety of statistics in estimating the discrepancy is proposed. In recent years, RKHS based method such as maximum mean discrepancy(MMD) has gained a lot of attention. (Gretton et al., 2012) has shown that in a universal-RKHS F, MMD(F, p, q) = 0 if and only if p = q, thus could be used for the two-sample hypothesis test. (Gretton et al., 2012) further provides unbiased estimator of MMD with fast asymptotic convergence rate, illustrating its advantages. On the other hand, estimating distribution properties with plugin(empirical) estimators on discrete setting is an active research area in recent years, where people focus on problem settings with large support size but not so large sample size. The Bernstein polynomial technique is introduced to analyze the bias of the plugin estimators in (Yi & Alon, 2020) , which provides remarkable progress on bias-reduction methods of the plugin estimators. It is thus interesting to ask if the plugin estimators could motivate new results for the RKHS-based two-sample test. Another interesting topic is about the probability kernel, defined as kernel function over probabilities, instead of over samples. As is easily seen, any discrepancy measure of distribution p and q could potentially be valid probability kernels, not so much work focuses on this. While (Jebara et al., 2004) introduced the so called probability product kernels which generalize a variety of discrepancy measures, its properties remain further study. Motivated by above observations, our work focuses on a specialized probability kernel function which is a direct generalization of sample-based RKHS methods such as MMD. We focus on using plugin-estimator as the default estimator of the kernel function we defined, and illustrate that with the help of Bernstein polynomial techniques, we could analyze the bias and convergence bounds of these plugin-estimators. Our work thus connects the fields of discrete distribution-property estimation and kernel-based hypothesis test, which brings interesting possibilities.

