CATEGORICAL NORMALIZING FLOWS VIA CONTINU-OUS TRANSFORMATIONS

Abstract

Despite their popularity, to date, the application of normalizing flows on categorical data stays limited. The current practice of using dequantization to map discrete data to a continuous space is inapplicable as categorical data has no intrinsic order. Instead, categorical data have complex and latent relations that must be inferred, like the synonymy between words. In this paper, we investigate Categorical Normalizing Flows, that is normalizing flows for categorical data. By casting the encoding of categorical data in continuous space as a variational inference problem, we jointly optimize the continuous representation and the model likelihood. Using a factorized decoder, we introduce an inductive bias to model any interactions in the normalizing flow. As a consequence, we do not only simplify the optimization compared to having a joint decoder, but also make it possible to scale up to a large number of categories that is currently impossible with discrete normalizing flows. Based on Categorical Normalizing Flows, we propose GraphCNF a permutationinvariant generative model on graphs. GraphCNF implements a three-step approach modeling the nodes, edges, and adjacency matrix stepwise to increase efficiency. On molecule generation, GraphCNF outperforms both one-shot and autoregressive flow-based state-of-the-art.

1. INTRODUCTION

Normalizing Flows have been popular for tasks with continuous data like image modeling (Dinh et al., 2017; Kingma and Dhariwal, 2018; Ho et al., 2019) and speech generation (Kim et al., 2019; Prenger et al., 2019) by providing efficient parallel sampling and exact density evaluation. The concept that normalizing flows rely on is the rule of change of variables, a continuous transformation designed for continuous data. However, there exist many data types typically encoded as discrete, categorical variables, like language and graphs, where normalizing flows are not straightforward to apply. To address this, it has recently been proposed to discretize the transformations inside normalizing flows to act directly on discrete data. Unfortunately, these discrete transformations have shown to be limited in terms of the vocabulary size and layer depth due to gradient approximations (Hoogeboom et al., 2019; Tran et al., 2019) . For the specific case of discrete but ordinal data, like images where integers represent quantized values, a popular strategy is to add a small amount of noise to each value (Dinh et al., 2017; Ho et al., 2019) . It is unnatural, however, to apply such dequantization techniques for the general case of categorical data, where values represent categories with no intrinsic order. Treating these categories as integers for dequantization biases the data to a non-existing order, and makes the modeling task significantly harder. Besides, relations between categories are often multi-dimensional, for example, word meanings, which cannot be represented with dequantization. In this paper, we investigate normalizing flows for the general case of categorical data. To account for discontinuity, we propose continuous encodings in which different categories correspond to unique, non-overlapping and thus close-to-deterministic volumes in a continuous latent space. Instead of pre-specifying the non-overlapping volumes per category, we resort to variational inference to jointly learn those and model the likelihood by a normalizing flow at the same time. This work is not the first to propose variational inference with normalizing flows, mostly considered for improving the flexibility of the approximate posterior (Kingma et al., 2016; Rezende and Mohamed, 2015; Van Den Berg et al., 2018) . Different from previous works, we use variational inference to learn

