EQUIVARIANT NORMALIZING FLOWS FOR POINT PROCESSES AND SETS Anonymous authors Paper under double-blind review

Abstract

A point process describes how random sets of exchangeable points are generated. The points usually influence the positions of each other via attractive and repulsive forces. To model this behavior, it is enough to transform the samples from the uniform process with a sufficiently complex equivariant function. However, learning the parameters of the resulting process is challenging since the likelihood is hard to estimate and often intractable. This leads us to our proposed model -CONFET. Based on continuous normalizing flows, it allows arbitrary interactions between points while having tractable likelihood. Experiments on various real and synthetic datasets show the improved performance of our new scalable approach.

1. INTRODUCTION

Many domains contain unordered data with a variable number of elements. The lack of ordering, also known as exchangeability, can be found in locations of cellular stations, locations of trees in a forest, point clouds, items in a shopping cart etc. This kind of data is represented with sets that are randomly generated from some underlying process that we wish to uncover. We choose to model this with spatial point processes, generative models whose realizations are sets of points. Perhaps the simplest non-trivial model is an inhomogeneous Poisson process. The locations of the points are assumed to be generated i.i.d. from some density (Chiu et al., 2013) . By simply modeling this density we can evaluate the likelihood and draw samples. We can do this easily with normalizing flows (Germain et al., 2015) . The process is then defined with a transformation of samples from a simple distribution to samples in the target distribution. However, the i.i.d. property is often wrong because the presence of one object will influence the distribution of the others. For example, short trees grow near each other, but are inhibited by taller trees (Ogata & Tanemura, 1985) . As an example we can generate points on the interval (0, 1) in the following way: we first flip a coin to decide whether to sample inside or outside of the interval (1/4, 3/4); then sample two points x 1 , x 2 uniformly on the chosen subset. Although the marginals p(x 1 ) and p(x 2 ) are uniformly distributed, knowing the position of one point gives us information about the other. Therefore, we should not model this process as if the points are independent, but model the joint distribution p(x 1 , x 2 ), with a constraint that p is symmetric to permutations (Figure 1 ). Unfortunately, when we include interactions between the points, the problem becomes significantly harder because the likelihood is intractable (Besag, 1975) . In this work, we present a way to solve this issue by using continuous normalizing flows that allow unrestricted transformations of points, with interactions between them. This way, a very hard problem of likelihood estimation suddenly becomes tractable. Our approach transforms simple processes into complex ones, by transforming their samples with expressive functions (Figure 1 ). Our main contributions are the following: • We reinterpret and unify existing techniques for modeling point processes and exchangeable data with normalizing flows. • We propose a new generative model (CONFET) that allows interactions between points and can be trained with maximum likelihood. The extensive experiments show that it outperforms other approaches, while remaining efficient and scalable.

2. POINT PROCESSES AND RANDOM SETS

Realizations of a finite point process on a bounded region B ⊂ R d are finite sets of points X = {x 1 , . . . , x n }, x i ∈ B. A point process is simple if no two points fall at exactly the same place. In practice, point processes are usually both finite and simple. One way to construct a general point process is by defining a discrete distribution p(n) for the number of points and a symmetric probability density p(X) on B n for their locations (Daley & Vere-Jones, 2007). The symmetry requirement comes from the fact that the probability of a sequence (x π(1) , . . . , x π(n) ) is the same for any permutation of the elements π -the points are exchangeable. Not knowing the order of the points can be solved trivially by averaging any density over all n! permutations. Another approach is to impose a canonical order, e.g. sorting the points by one of the dimensions. Then the probability of observing a set of exchangeable elements is defined w.r.t. the joint probability of an order statistic & Berger, 2002) : x (1) < • • • < x (n) (Casella p(X) = 1 n! p(x (1) , . . . , x (n) ). A traditional way to define a point process is with an intensity function that assigns a non-negative value to every subset of B, corresponding to a number of points we expect to see there (Møller & Waagepetersen, 2003) . An example is a homogeneous Poisson process with constant intensity λ. To generate a new realization, we first sample n ∼ Pois(λ), and then sample n points uniformly on B. If we define the intensity as a function of position λ(x), we get an inhomogeneous Poisson process which is equivalent to defining the un-normalized probability density function on B. Now n follows Pois(Λ), where Λ is the total intensity λ(x)dx. We get the density at a location x by normalizing the intensity p(x) = λ(x)/Λ. Combining the distribution of the number of points with the distribution of their locations gives us a well known formula for the likelihood of an inhomogeneous Poisson process (Daley & Vere-Jones, 2007, eq. 7.1.2) : L(X) = xi∈X λ(x i ) exp - B λ(x)dx . Instead of modeling λ(x), we can model p(x) directly to avoid estimating the integral, without losing generality (Yuan et al., 2020) . This shift in the perspective from intensity to probability density function allows us to utilize rich existing methods from density estimation. An extension of an inhomogeneous process that allows interactions between points defines the conditional intensity λ(x i |X) (Papangelou, 1974) . This may be interpreted as the conditional probability of having a point at x i given the rest of the process coincides with X. The likelihood is not tractable anymore so previous works used pseudolikelihood instead, replacing λ(x) with λ(x|X) in Eq. 2 (Besag, 1975; Baddeley & Turner, 2000) . One example of such a process is a clustering process (Neyman & Scott, 1958 ) that generates the points in two steps. First, we sample the cluster centers from a Poisson process, then we sample the final points from normal distributions centered around cluster positions. In contrast, a repulsion process (Matérn, 2013) generates initial points from a uniform process and removes those that have neighbors inside radius R.



Figure 1: (Left) A symmetric density and its samples, see details in text. (Right) Illustration of our approach, going from the uniform to the target process with a continuous normalizing flow.

