UNIVERSAL MINI-BATCH CONSISTENCY FOR SET ENCODING FUNCTIONS

Abstract

Previous works have established solid foundations for neural set functions, complete with architectures which preserve the necessary properties for operating on sets, such as invariance to permutations of the set elements. Subsequent work has highlighted the utility of Mini-Batch Consistency (MBC), the ability to sequentially process any permutation of a set partition scheme (e.g. streaming chunks of data) while guaranteeing the same output as processing the whole set at once. Currently, there exists a division between MBC and non-MBC architectures. We propose a framework which converts an arbitrary non-MBC model to one which satisfies MBC. In doing so, we allow all set functions to universally be considered in an MBC setting (UMBC). Additionally, we explore a set-based Monte Carlo dropout strategy which applies dropout to entire set elements. We validate UMBC with theoretical proofs, unit tests, and also provide qualitative/quantitative experiments on Gaussian data, clean and corrupted point cloud classification, and amortized clustering on ImageNet. Additionally, we investigate the probabilistic calibration of set-functions under test-time distributional shifts. Our results demonstrate the utility of UMBC, and we further discover that our dropout strategy improves uncertainty calibration.

1. INTRODUCTION

Set encoding functions (Zaheer et al., 2017; Bruno et al., 2021; Lee et al., 2019; Kim, 2021) have become a broad research topic in recent publications. This popularity can be partly attributed to natural set structures in data such as point clouds or even datasets themselves. Given a set of cardinality N , one may desire to group the elements (clustering), identify them (classification), or find likely elements to complete the set (completion/extension). A key difference from vanilla neural networks, is that neural set functions must be able to handle dynamic set cardinalities for each input set. Additionally, sets are considered unordered, so the function must make consistent predictions for any permutation of set elements. Deep Sets (Zaheer et al., 2017 ) is a canonical work providing an investigation of the requirements and proposal of valid neural set function architectures. Deep Sets utilizes traditional, permutation equivariant (Property 3.2) linear and convolutional neural network layers in conjunction with permutation invariant (Property 3.1) set-pooling functions (e.g. {min, max, sum, mean}) in order to satisfy the necessary conditions and perform inference on sets. The Set Transformer (Lee et al., 2019) utilizes powerful multi-headed self-attention (Vaswani et al., 2017) to construct multiple set-capable transformer blocks, as well as an attentive pooling function. Though powerful, these works never explicitly considered the case where it may be required to process a set in multiple partitions at test time, which can happen for a variety of reasons including device resource constraints, prohibitively large or even infinite test set sizes, and streaming data conditions. The MBC property of set functions was identified by Bruno et al. (2021) who also proposed the Slot Set Encoder (SSE), a specific version of a cross-attentive pooling mechanism which satisfies MBC, guaranteeing it will produce a consistent output for all possible piecewise processing of set partitions. The introduction of the MBC property naturally leads to the rise of a new dimension in the taxonomy of set functions, namely those which satisfy MBC and those which do not. The SSE is an example of one valid MBC architecture which comes at the cost of eliminating powerful self-attentive models such as the Set Transformer. Self-attention can be the best choice for tasks which require leveraging pairwise relationships between set elements such as clustering (as we show later in results in Figure 4 and Table 2 ) where the Set Transformer outperforms SSE). Models such as the Set Transformer cannot make MBC guarantees when updating pooled set representations, as self-attention blocks require all N elements in a single pass, and therefore do not satisfy MBC (i.e. processing separate pieces of a set yields a different output than processing the whole set at once). Naively using such non-MBC set functions in an MBC setting can cause a severe degradation in performance, as depicted in Figures 2a and 2b where the Set Transformer exhibits poor likelihood and inconsistent clustering predictions. With the addition of a UMBC module UMBC+Set Transformer inherits an MBC guarantee, yielding consistent results, and much higher likelihoods (Figures 2c and 2d ) (See Section 5 and Appendix B for details of the experiment). The quantitative effect of MBC vs non-MBC encoding on a pooled set representation can be seen in Figure 1 which shows the variance between pooled representations of 100 random partitions of the same set. (See Appendix C for details). In this work, we propose, and verify both theoretically and empirically that there exists a universal method for converting arbitrary non-MBC set functions into MBC functions, providing MBC guarantees for mini-batch processing of random set partitions. This allows for any set encoder to be used in an MBC setting where it may have previously failed (e.g.streaming data). This result has large implications for all current and future set functions which are not natively MBC, which can now be used in a wider variety of settings and under more restrictive conditions. Animations, code, and tests can be found in the supplementary file and also at: https://github.com/anonymous-subm1t/umbc  Our contributions in this work are as follows: • In Theorem 4.1 we show that any arbitrary non-MBC set encoder can become MBC, guaranteeing that mini-batch processing of sets of any cardinality at test-time will give the same result as processing the full set at once.



Figure 2: (•, +, ×, □ □ □) correspond to classes in the input set. Ellipses are the model's clustering prediction. In streaming settings, models must process the stream without storing streamed inputs. a-b: The Set Transformer delivers poor likelihood on different set streams. c-d: Set Transformer with a UMBC module becomes an MBC function, yielding better likelihood, and consistent predictions regardless of the data stream. For a description of streaming settings, see Section 5; additional streams are shown in Figure 8. 1. (Non-MBC) Set Encoder 2. (Non-MBC) Set Encoder 3. (MBC) Set Encoder 4. (MBC) Universal MBC Model

Figure 1: σ 2 between encoded features of 100 random mini-batched set partitions. Set Transformer (not MBC) produces variance in the output. UMBC+Set Transformer produces consistent output for all 100 random mini-batched partitions.

