UNIVERSAL MINI-BATCH CONSISTENCY FOR SET ENCODING FUNCTIONS

Abstract

Previous works have established solid foundations for neural set functions, complete with architectures which preserve the necessary properties for operating on sets, such as invariance to permutations of the set elements. Subsequent work has highlighted the utility of Mini-Batch Consistency (MBC), the ability to sequentially process any permutation of a set partition scheme (e.g. streaming chunks of data) while guaranteeing the same output as processing the whole set at once. Currently, there exists a division between MBC and non-MBC architectures. We propose a framework which converts an arbitrary non-MBC model to one which satisfies MBC. In doing so, we allow all set functions to universally be considered in an MBC setting (UMBC). Additionally, we explore a set-based Monte Carlo dropout strategy which applies dropout to entire set elements. We validate UMBC with theoretical proofs, unit tests, and also provide qualitative/quantitative experiments on Gaussian data, clean and corrupted point cloud classification, and amortized clustering on ImageNet. Additionally, we investigate the probabilistic calibration of set-functions under test-time distributional shifts. Our results demonstrate the utility of UMBC, and we further discover that our dropout strategy improves uncertainty calibration.

1. INTRODUCTION

Set encoding functions (Zaheer et al., 2017; Bruno et al., 2021; Lee et al., 2019; Kim, 2021) have become a broad research topic in recent publications. This popularity can be partly attributed to natural set structures in data such as point clouds or even datasets themselves. Given a set of cardinality N , one may desire to group the elements (clustering), identify them (classification), or find likely elements to complete the set (completion/extension). A key difference from vanilla neural networks, is that neural set functions must be able to handle dynamic set cardinalities for each input set. Additionally, sets are considered unordered, so the function must make consistent predictions for any permutation of set elements. Deep Sets (Zaheer et al., 2017 ) is a canonical work providing an investigation of the requirements and proposal of valid neural set function architectures. Deep Sets utilizes traditional, permutation equivariant (Property 3.2) linear and convolutional neural network layers in conjunction with permutation invariant (Property 3.1) set-pooling functions (e.g. {min, max, sum, mean}) in order to satisfy the necessary conditions and perform inference on sets. The Set Transformer (Lee et al., 2019) utilizes powerful multi-headed self-attention (Vaswani et al., 2017) to construct multiple set-capable transformer blocks, as well as an attentive pooling function. Though powerful, these works never explicitly considered the case where it may be required to process a set in multiple partitions at test time, which can happen for a variety of reasons including device resource constraints, prohibitively large or even infinite test set sizes, and streaming data conditions. The MBC property of set functions was identified by Bruno et al. (2021) who also proposed the Slot Set Encoder (SSE), a specific version of a cross-attentive pooling mechanism which satisfies MBC, guaranteeing it will produce a consistent output for all possible piecewise processing of set partitions. The introduction of the MBC property naturally leads to the rise of a new dimension in the taxonomy of set functions, namely those which satisfy MBC and those which do not. The SSE is an example of one valid MBC architecture which comes at the cost of eliminating powerful self-attentive models such as the Set Transformer. Self-attention can be the best choice for tasks which require leveraging

