IMPROVING GENERATIVE FLOW NETWORKS WITH PATH REGULARIZATION

Abstract

Generative Flow Networks (GFlowNets) are recently proposed models for learning stochastic policies that generate compositional objects by sequences of actions with the probability proportional to a given reward function. The central problem of GFlowNets is to improve their exploration and generalization. In this work, we propose a novel path regularization method based on optimal transport theory that places prior constraints on the underlying structure of the GFlowNet. The prior is designed to help the GFlowNet better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. The path regularization controls the flow in the GFlowNet to generate more diverse and novel candidates via maximizing the optimal transport distances between two forward policies or to improve the generalization via minimizing the optimal transport distances. In addition, we derive an efficient implementation of the regularization by finding its closed-form solutions in specific cases and a meaningful upper bound that can be used as an approximation to minimize the regularization term. We empirically demonstrate the advantages of our path regularization on a wide range of tasks, including synthetic hypergrid environment modeling, discrete probabilistic modeling, and biological sequence design.

1. INTRODUCTION

Recently proposed by Bengio et al. (2021a) , Generative Flow Networks (GFlowNets) are generative models for compositional objects, which learn a stochastic policy that sequentially modifies a temporarily constructed object through a sequence of actions to make the generating likelihood proportional to a given reward function. Specifically, GFlowNets aim to solve the problem of generating a diverse set of good candidates. In biological sequence design, diversity is a crucial consideration because of improving the chance of discovering candidates that can satisfy many evaluation criteria later in downstream phases (Jain et al. (2022) ). Especially in the multi-round active learning setting, where the generator was iteratively improved by receiving feedback from an oracle on their proposed candidates, the effect of diverse generation becomes apparent because more diversity means more exploration and knowledge gained. Besides, the generalization ability of GFlowNets (Zhang et al. ( 2022); Malkin et al. ( 2022)) over structured data makes them a good framework for discrete probabilistic modeling. The central problems of GFlowNets are improving exploration and generalization. In this work, we propose to train the GFlowNet with an additional path regularization via optimal transport (Villani, 2003)), which acts as a prior constraint on its underlying structure. The prior is designed to help the GFlowNet better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. Precisely, the path regularization via OT can help the GFlowNet generate more diverse and novel candidates via maximizing OT distances between two forward policies or improving generalization via minimizing the OT distances. For generalization: To improve GFlowNet's generalization, we propose the following prior constraints: (i) The forward policies of two neighbor states are expected to be similar in the way that they both have the focused tendency of choosing the next action, which implicitly forces the GFlowNet to find states with high rewards rather than exploring, especially in sparse environments. ; (ii) Trajectories related to positive objects (both have high rewards) must share their paths. As a result, the similarity of states along trajectories with high flow is higher than in other places. From a probabilistic perspective, we propose to measure the similarity of states s and s ′ in the GFlowNet by the

