

Abstract

Kernelized Stein discrepancy (KSD), though being extensively used in goodness-offit tests and model learning, suffers from the curse-of-dimensionality. We address this issue by proposing the sliced Stein discrepancy and its scalable and kernelized variants, which employ kernel-based test functions defined on the optimal one-dimensional projections. When applied to goodness-of-fit tests, extensive experiments show the proposed discrepancy significantly outperforms KSD and various baselines in high dimensions. For model learning, we show its advantages over existing Stein discrepancy baselines by training independent component analysis models with different discrepancies. We further propose a novel particle inference method called sliced Stein variational gradient descent (S-SVGD) which alleviates the mode-collapse issue of SVGD in training variational autoencoders.

1. INTRODUCTION

Discrepancy measures for quantifying differences between two probability distributions play key roles in statistics and machine learning. Among many existing discrepancy measures, Stein discrepancy (SD) is unique in that it only requires samples from one distribution and the score function (i.e. the gradient up to a multiplicative constant) from the other (Gorham & Mackey, 2015) . SD, a special case of integral probability metric (IPM) (Sriperumbudur et al., 2009) , requires finding an optimal test function within a given function family. This optimum is analytic when a reproducing kernel Hilbert space (RKHS) is used as the test function family, and the corresponding SD is named kernelized Stein discrepancy (KSD) (Liu et al., 2016; Chwialkowski et al., 2016) . Variants of SDs have been widely used in both Goodness-of-fit (GOF) tests (Liu et al., 2016; Chwialkowski et al., 2016) and model learning (Liu & Feng, 2016; Grathwohl et al., 2020; Hu et al., 2018; Liu & Wang, 2016) . Although theoretically elegant, KSD, especially with RBF kernel, suffers from the "curseof-dimensionality" issue, which leads to significant deterioration of test power in GOF tests (Chwialkowski et al., 2016; Huggins & Mackey, 2018 ) and mode collapse in particle inference (Zhuo et al., 2017; Wang et al., 2018) . A few attempts have been made to address this problem, however, they either are limited to specific applications with strong assumptions (Zhuo et al., 2017; Chen & Ghattas, 2020; Wang et al., 2018) or require significant approximations (Singhal et al., 2019) . As an alternative, in this work we present our solution to this issue by adopting the idea of "slicing". Here the key idea is to project the score function and test inputs onto multiple one dimensional slicing directions, resulting in a variant of SD that only requires to work with one-dimensional inputs for the test functions. Specifically, our contributions are as follows. • We propose a novel theoretically validated family of discrepancies called sliced Stein discrepancy (SSD), along with its scalable variant called max sliced kernelized Stein discrepancy (maxSKSD) using kernel tricks and the optimal test directions. 



• A GOF test is derived based on an unbiased estimator of maxSKSD with optimal test directions. MaxSKSD achieves superior performance on benchmark problems and restricted Boltzmann machine models(Liu et al., 2016; Huggins & Mackey, 2018).

