SCALABLE SUBSET SAMPLING WITH NEURAL CONDI-TIONAL POISSON NETWORKS

Abstract

A number of problems in learning can be formulated in terms of the basic primitive of sampling k elements out of a universe of n elements. This subset sampling operation cannot directly be included in differentiable models, and approximations are essential. Current approaches take an order sampling approach to sampling subsets and depend on differentiable approximations of the Top-k operator for selecting the largest k elements from a set. We present a simple alternative method for sampling subsets based on conditional Poisson sampling. Unlike order sampling approaches, the complexity of the proposed method is independent of the subset size, which makes the method scalable to large subset sizes. We adapt the procedure to make it efficient and amenable to discrete gradient approximations for use in differentiable models. Furthermore, the method allows the subset size parameter k to be differentiable. We validate our approach extensively, on image and text model explanation, image subsampling and stochastic k-nearest neighbor tasks outperforming existing methods in accuracy, efficiency and scalability.

1. INTRODUCTION

The fundamental combinatorial operation of selecting subsets of elements from a given universe is ever increasingly being incorporated in differentiable neural models due to its range of applicability. Example applications include model explanations (Chen et al., 2018) , sequence modeling (Kool et al., 2019) , point cloud modeling (Yang et al., 2019) , and nearest neighbor networks (Grover et al., 2018) . Current neural network approaches for sampling subsets generally fall in the class of order sampling methods. In the order sampling scheme, each element in the universe is assigned an independent ranking random variable. To obtain a subset sample of size k, the largest (or smallest) k elements are chosen. Thereby, the ranking variable distribution induces a probability distribution over the possible subsets. However, the operation of choosing the largest k elements (Top-k) is naturally not differentiable, since it is a discrete operation. This means that the Top-k procedure cannot be directly used in gradient learning models. This has led to a number of proposals of relaxed and differentiable versions of the Top-k operator (Goyal et al., 2018; Pietruszka et al., 2021; Plötz & Roth, 2018) . Building on Top-k approaches several methods of sampling subsets as k-hot vectors have appeared in the literature (Paulus et al., 2020; Xie & Ermon, 2019) . In this paper, we explore Poisson sampling (Tillé, 2006) and conditional Poisson sampling (Hájek & Dupač, 1981) as an alternative to order sampling for subsets. With Poisson sampling, each element in the set is independently drawn to be selected for the subset or not. As these independent trials cannot guarantee a fixed size for subsets, with conditional Poisson sampling, the Poisson sampling procedure is conditioned to return subsets of exactly k elements. In practice, the conditioning amounts to repeating the Poisson sampling procedure until a subset of size k is obtained. The general (conditional) Poisson sampling approach has a number of features which make it an attractive alternative to Top-k-based order sampling methods. Firstly, the sampling is done independently in Poisson sampling methods, which makes the procedure very efficient for sampling subsets with large values of k. By contrast, current Top-k methods (Goyal et al., 2018; Plötz & Roth, 2018) often have an inner loop depending on k, which makes them expensive for sampling large

