Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints

Abstract

Recently, there has been significant progress designing deep generative models that generate realistic sequence data such as text or music. Nevertheless, it remains difficult to incorporate high-level structure to guide the generative process. We propose a novel approach for incorporating structure in the form of relational constraints between different subcomponents of an example (e.g., lines of a poem or measures of music). Our generative model has two parts: (i) one model to generate a realistic set of relational constraints, and (ii) a second model to generate realistic data satisfying these constraints. To train model (i), we propose a novel program synthesis algorithm that infers the relational constraints present in the training data, and then train the models based on the resulting relational constraints. In our experiments, we show that our approach significantly improves over stateof-the-art approaches in terms of capturing high-level structure in the data, while performing comparably or better in terms of low-level structure.

1. Introduction

Over the past few years, there has been tremendous progress in designing deep generative models for generating sequence data such as natural language (Vaswani et al., 2017) or music (Huang et al., 2019) . These approaches leverage the vast quantities of data available in conjunction with unsupervised and self-supervised learning to learn probabilistic models of the data; then, new examples can be generated by sampling from these models, with the possibility of conditioning on initial elements of the sequence. Despite this progress, a key challenge facing deep generative models is the difficulty incorporating high-level structure into the generated examples-e.g., rhyming and meter across lines of a poem, or repetition across measures of a piece of music. The ability to capture high-level structure is important for improving the quality of the generated data, especially in low-data regimes where only small numbers of examples are available-intuitively, knowledge of the structure compresses the amount of information that the generative model has to learn. Furthermore, explicit representations of structure-i.e., in a symbolic way rather than implicitly in a vector embedding-can have the added benefit that users can modify the structure to guide the generative process. Recently, Young et al. (2019) proposed a technique called neurosymbolic generative models for incorporating high-level structure into image generation, focusing on simple 2D repeating patterns in images of building facades (e.g., repeating windows). The basic idea is to leverage program synthesis to extract structure from data-in particular, given an example image x, they devise an algorithm A that extracts a program c = A(x) that represents the set of 2D repeating patterns present in training examples x. Then, using the pairs (x, c), they train two generative models: (i) a model p φ (c) that generates a program, and (ii) a model p θ (x | c) that generates an image that contains the structure represented by c. However, their approach is heavily tailored to the image domain in several ways. First, their representation of structure is geared towards relatively simple patterns occurring in images of building facades. In addition, their algorithm A is specifically designed to extract this kind of program from an input image, as are their models p φ (c) for generating programs and p θ (x | c) for generating images conditioned on the program. We represent relational constraints c x present in an example x by relating each subcomponent w of a given example x with a prototype w, which can intuitively be thought of as the "original" subcomponent from which w is constructed. In particular, the relationship between w and w is labeled with a set of relations R, which encodes the constraint that w and w should satisfy relation r for each r ∈ R. Importantly, while each subcomponent is associated with a single prototype, each prototype may be associated with multiple subcomponents. As a consequence, different subcomponents associated with the same prototype are related in some way. This representation is compact, only requiring linearly many constraints in the number of subcomponents in x (assuming the number of prototypes is constant). Intuitively, compactness ensures the representation both generalizes well and is easy to generate. Then, we design a program synthesis algorithm that can extract an optimal representation of the structure present in a training example x. We show how to express the synthesis problem as a constrained combinatorial optimization problem, which we solve using an SMT solver Z3 (De Moura & Bjørner, 2008) . Next, we represent c as a sequence, and design p φ (c) to be inferred through a specialized sequence VAE. Finally, we propose three possible designs of p θ (x | c) based on trying to identify an example x that is realistic (e.g., according to a pretrained generative model p θ (x)) while simultaneously satisfies the constraints c. We evaluate our approach on two tasks: poetry generation, where the relational constraints include rhyming lines or lines with shared meter, and music generation, where the relational constraints include equality in terms of pitch or rhythm, that one measure is a transposition of another (i.e., pitches shifted up or down by a constant amount), etc. We show that our approaches outperform or perform similarly to SOTA models according to many metrics. Finally, we also perform a qualitative evaluation where we show how the user can modify the high-level to generate examples that satisfy additional desired constraints. This ability demonstrates an important feature of our approach-i.e., that the user can guide the generative process by modifying the relational constraints as desired. Example. Figure 1 illustrates how our approach is applied to generate poetry. During training, our approach uses program synthesis to infer relational constraints c x present in the examples x, and uses both x and c x to train the generative models. Here, c x is a bipartite graph, where the LHS vertices are prototypes, and the RHS vertices correspond to lines of x. Each vertex on the right is connected to exactly one prototype, and is labeled with constraints on how it should relate to its prototype. To generate new examples, it first samples relational constraints c, and then samples an example x that satisfies c-i.e., we need to choose a line to fill each RHS node in a way that the line satisfies the relations with its prototype. Furthermore, a user can modify the sampled constraint c to guide the generative process. Thus, our approach enables users to flexibly incorporate domain knowledge on the high-level structure of the data into the generative process, both in terms of the relational constraints included and by allowing them to modify the generated relational constraints.

Related work.

There has been recent interest in leveraging program synthesis to improve machine learning. For instance, it has been applied to unsupervised learning of latent structure in drawings (Ellis et al., 2015) and to reinforcement learning (Verma et al., 2018) (Valkov et al., 2018) . In contrast to these approaches, our focus is on generative models. In particular, we extend recent work leveraging these ideas in the setting of image generation to incorporating high-level relational structure into sequence generation tasks (Young et al., 2019) . Early music generation approaches were rule-based (Ovans & Davison, 1992) or used simple statistical models such as Markov models (Sandred et al., 2009; Cope, 1987) or probabilistic CFGs (Quick, 2016) . Recent work has used deep learning to generate music (Huang et al., 2019; OpenAI, 2019) and poetry (Liao et al., 2019) ; our experiments show that these approaches have difficulty generating realistic high-level structure. Approaches have incorpo-



. These techniques have benefits such as improving interpretability Verma et al. (2018); Ellis et al. (2020), enabling learning from fewer examples (Ellis et al., 2015), generalizing more robustly (Inala et al., 2019), and being easier to formally reason about (Bastani et al., 2018). More recently, there has been work leveraging program synthesis in conjunction with deep learning, where the DNN handles perception and program synthesis handles high-level structure (Ellis et al., 2017), including work in the lifelong learning setting

