Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints

Abstract

Recently, there has been significant progress designing deep generative models that generate realistic sequence data such as text or music. Nevertheless, it remains difficult to incorporate high-level structure to guide the generative process. We propose a novel approach for incorporating structure in the form of relational constraints between different subcomponents of an example (e.g., lines of a poem or measures of music). Our generative model has two parts: (i) one model to generate a realistic set of relational constraints, and (ii) a second model to generate realistic data satisfying these constraints. To train model (i), we propose a novel program synthesis algorithm that infers the relational constraints present in the training data, and then train the models based on the resulting relational constraints. In our experiments, we show that our approach significantly improves over stateof-the-art approaches in terms of capturing high-level structure in the data, while performing comparably or better in terms of low-level structure.

1. Introduction

Over the past few years, there has been tremendous progress in designing deep generative models for generating sequence data such as natural language (Vaswani et al., 2017) or music (Huang et al., 2019) . These approaches leverage the vast quantities of data available in conjunction with unsupervised and self-supervised learning to learn probabilistic models of the data; then, new examples can be generated by sampling from these models, with the possibility of conditioning on initial elements of the sequence. Despite this progress, a key challenge facing deep generative models is the difficulty incorporating high-level structure into the generated examples-e.g., rhyming and meter across lines of a poem, or repetition across measures of a piece of music. The ability to capture high-level structure is important for improving the quality of the generated data, especially in low-data regimes where only small numbers of examples are available-intuitively, knowledge of the structure compresses the amount of information that the generative model has to learn. Furthermore, explicit representations of structure-i.e., in a symbolic way rather than implicitly in a vector embedding-can have the added benefit that users can modify the structure to guide the generative process.

Recently, Young et al. (2019) proposed a technique called neurosymbolic generative models

for incorporating high-level structure into image generation, focusing on simple 2D repeating patterns in images of building facades (e.g., repeating windows). The basic idea is to leverage program synthesis to extract structure from data-in particular, given an example image x, they devise an algorithm A that extracts a program c = A(x) that represents the set of 2D repeating patterns present in training examples x. Then, using the pairs (x, c), they train two generative models: (i) a model p φ (c) that generates a program, and (ii) a model p θ (x | c) that generates an image that contains the structure represented by c. However, their approach is heavily tailored to the image domain in several ways. First, their representation of structure is geared towards relatively simple patterns occurring in images of building facades. In addition, their algorithm A is specifically designed to extract this kind of program from an input image, as are their models p φ (c) for generating programs and p θ (x | c) for generating images conditioned on the program.

