CONDITIONAL PERMUTATION INVARIANT FLOWS

Abstract

We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data.

1. INTRODUCTION

Invariances built into neural network architectures can exploit symmetries to create more data efficient models. While these principles have long been known in discriminative modelling (Lecun et al., 1998; Cohen & Welling, 2015; 2016; Finzi et al., 2021) , in particular permutation invariance has only recently become a topic of interest in generative models (Greff et al., 2019; Locatello et al., 2020) . When learning a density that should be invariant to permutations we can either incorporate permutation invariance into the architecture of our deep generative model or we can factorially augment our observations and hope that the generative model architecture is sufficiently flexible to at least approximately learn a distribution that assigns the same mass to known equivalents. The former is vastly more data efficient but places restrictions on the kinds of architectures that can be utilized, which might lead one to worry about performance limitations. While the latter does allow unrestricted architectures it is often is so data-inefficient that, despite the advantage of fewer limitations, achieving good performance is extremely challenging, to the point of being impossible. In this work we describe a new approach to permutation invariant conditional density estimation that, while architecturally restricted to achieve invariance, is demonstrably flexible enough to achieve high performance on a number of non-trivial density estimation tasks. Permutation invariant distributions, where the likelihood of a collection of objects does not change if they are re-ordered, appear widely. The joint distribution of independent and identically distributed observations is permutation invariant, while in more complex examples the observations are no longer independent, but still exchangeable. Practical examples include the distribution of non-overlapping physical object locations in a scene, the set of potentially overlapping object bounding boxes given an image, and so forth (see Fig. 1 ). In all of these we know that the probability assigned to a set of such objects (i.e. locations, bounding boxes) should be invariant to the order of the objects in the joint distribution function argument list. Recent work has addressed this problem by introducing equivariant normalizing flows (Köhler et al., 2020; Satorras et al., 2021; Biloš & Günnemann, 2021) . Our work builds on theirs but differs in subtle but key ways that increase the flexibility of our models. More substantially this body of prior art focuses on non-conditional density estimation. The work of Satorras et al. ( 2021) does consider a form of implicit conditioning, where the flow is evaluated for different graph sizes. In this work we go beyond that by making the dynamics that constitute our flow dependent on a conditional input. To this end, we believe we are the first to develop conditional permutation invariant flows, that are explicitly dependent on external input.

