IMPROVING GENERATIVE FLOW NETWORKS WITH PATH REGULARIZATION

Abstract

Generative Flow Networks (GFlowNets) are recently proposed models for learning stochastic policies that generate compositional objects by sequences of actions with the probability proportional to a given reward function. The central problem of GFlowNets is to improve their exploration and generalization. In this work, we propose a novel path regularization method based on optimal transport theory that places prior constraints on the underlying structure of the GFlowNet. The prior is designed to help the GFlowNet better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. The path regularization controls the flow in the GFlowNet to generate more diverse and novel candidates via maximizing the optimal transport distances between two forward policies or to improve the generalization via minimizing the optimal transport distances. In addition, we derive an efficient implementation of the regularization by finding its closed-form solutions in specific cases and a meaningful upper bound that can be used as an approximation to minimize the regularization term. We empirically demonstrate the advantages of our path regularization on a wide range of tasks, including synthetic hypergrid environment modeling, discrete probabilistic modeling, and biological sequence design.

1. INTRODUCTION

Recently proposed by Bengio et al. (2021a) , Generative Flow Networks (GFlowNets) are generative models for compositional objects, which learn a stochastic policy that sequentially modifies a temporarily constructed object through a sequence of actions to make the generating likelihood proportional to a given reward function. Specifically, GFlowNets aim to solve the problem of generating a diverse set of good candidates. In biological sequence design, diversity is a crucial consideration because of improving the chance of discovering candidates that can satisfy many evaluation criteria later in downstream phases (Jain et al. (2022) ). Especially in the multi-round active learning setting, where the generator was iteratively improved by receiving feedback from an oracle on their proposed candidates, the effect of diverse generation becomes apparent because more diversity means more exploration and knowledge gained. Besides, the generalization ability of GFlowNets (Zhang et al. (2022) ; Malkin et al. ( 2022)) over structured data makes them a good framework for discrete probabilistic modeling. The central problems of GFlowNets are improving exploration and generalization. In this work, we propose to train the GFlowNet with an additional path regularization via optimal transport (Villani, 2003)), which acts as a prior constraint on its underlying structure. The prior is designed to help the GFlowNet better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. Precisely, the path regularization via OT can help the GFlowNet generate more diverse and novel candidates via maximizing OT distances between two forward policies or improving generalization via minimizing the OT distances. For generalization: To improve GFlowNet's generalization, we propose the following prior constraints: (i) The forward policies of two neighbor states are expected to be similar in the way that they both have the focused tendency of choosing the next action, which implicitly forces the GFlowNet to find states with high rewards rather than exploring, especially in sparse environments. ; (ii) Trajectories related to positive objects (both have high rewards) must share their paths. As a result, the similarity of states along trajectories with high flow is higher than in other places. From a probabilistic perspective, we propose to measure the similarity of states s and s ′ in the GFlowNet by the transition probability from s to s ′ ; (iii) When the GFlowNet learns something, the sparse flow is expected to generalize better. Thus, although many solutions exist for learning a GFlowNet, our proposal priors promote refining the GFlowNet's flow, i.e., enhance flow on high flow trajectories and vice versa. For diversity and exploration: To encourage the GFlowNet's policy to generate more diverse candidates, such as in the multi-round active learning settings, we propose to put a prior constraint on the forward policies of two neighbor states. Specifically, this prior constraint intentionally promotes the "dissimilarity" between the forward policies of two neighbor states. In other words, it forces the children states of two considered neighbor states far from each other in terms of probabilistic transition, which will help the GFlowNet generate more diverse and novel candidates. Why OT is a good solution?Indeed, we need a measure of "distance" between pairs of probability distributions. The optimal transport (OT) theory (Villani ( 2003)) studies how probabilistic mass can be optimally transported from the supports of one probabilistic distribution to the supports of another distribution given a cost function. The minimum transportation cost, called distance, can be used as a metric that quantifies the distance between two probability distributions. In the context of GFlowNets, we want to affect nearby states, which can be done by regularizing on the OT distance between the forward policies P F (•|s) and P F (•|s ′ ) of two neighbor states s and s ′ . To compute the OT distance, we solve an OT problem between two discrete probability measures, whose support points are the child states of s and s ′ correspondingly, given the transportation cost c(u i , v j ) from each child u i of s to each child v j of s ′ . While the weakness of KL divergence is that it requires two interested distributions to share the same set of supports, OT can deal with this problem efficiently. Another reason is that the cost used in our OT distance can capture the given DAG's structure and the GFlowNet's flow, while directly using KL divergence cannot. Contributions. In this work, we develop a novel path regularization based on OT theory for either helping the GFlowNet better discover the latent structure of the target distribution or enhancing its ability to explore the environment in the context of active learning. Our contributions can be summarized as follows: 1. We propose to train the GFlowNet with an additional path regularization via OT, which acts as a prior on the underlying structure of the GFlowNets for either improving the generalization capability or enhancing the exploration ability of the GFlowNet. 2. We define a new directed distance between two arbitrary states in the GFlowNet, which can be naturally chosen as the transportation cost for computing the OT distance and link the proposed regularization to entropy terms. 3. We also derive an efficient implementation of the proposed regularization by finding its closed-form solutions in specific cases and a meaningful upper bound that can be used as an approximation when we want to minimize the regularization term. Organization. The paper is organized as follows. In Section 2, we provide the background of GFlowNets and OT. In Section 3, we propose a new directed distance between two arbitrary states in the GFlowNet and then derive the formulation of path regularization via OT. We also explain why it is the natural and optimal choice for constructing the transportation cost between states. Then, we derive the upper bound and efficient implementation of the proposed path regularization. We provide extensive experiment results of our path regularization via OT in Section 4 and conclude the paper with a few discussions in Section 5. Theoretical proofs, as well as experimental settings and additional results, are provided in the Appendix.

2. BACKGROUND

2.1 GFLOWNETS Given a compositional space X , where each object x ∈ X can be constructed by taking a sequence of discrete actions from the action space A. Specifically, the construct of each object begins from the source state s 0 and ends in the final state s f . Incrementally, the generation process modifies a temporarily constructed object, which is called a state s ∈ S. In addition, a specific action determines that the object is completely constructed and represents a terminal state, such that s = x ∈ X . These states and actions correspond to the vertices and edges of a directed acyclic graph G = (S, A). The construction of an object x ∈ X defines a complete trajectory, which is a sequence of transitions

