GENERATIVE MULTI-FLOW NETWORKS: CENTRAL-IZED, INDEPENDENT AND CONSERVATION

Abstract

Generative flow networks utilize the flow matching loss to learn a stochastic policy for generating objects from a sequence of actions, such that the probability of generating a pattern can be proportional to the corresponding given reward. However, existing works can only handle single flow model tasks and cannot directly generalize to multi-agent flow networks due to limitations such as flow estimation complexity and independent sampling. In this paper, we propose the framework of generative multi-flow networks (GMFlowNets) that can be applied to multiple agents to generate objects collaboratively through a series of joint actions. Then, the centralized flow network algorithm is proposed for centralized training GM-FlowNets, while the independent flow network algorithm is proposed to achieve decentralized execution of GMFlowNets. Based on the independent global conservation condition, the flow conservation network algorithm is then proposed to realize centralized training with decentralized execution paradigm. Theoretical analysis proves that using the multi-flow matching loss function can train a unique Markovian flow, and the flow conservation network can ensure independent policies can generate samples with probability proportional to the reward function. Experimental results demonstrate the performance superiority of the proposed algorithms compared to reinforcement learning and MCMC-based methods.

1. INTRODUCTION

Generative flow networks (GFlowNets) Bengio et al. (2021b) can sample a diverse set of candidates in an active learning setting, where the training objective is to approximate sample them proportionally to a given reward function. Compared to reinforcement learning (RL), where the learned policy is more inclined to sample action sequences with higher rewards, GFlowNets can perform better on exploration tasks. Since the goal of GFlowNets is not to generate a single highest-reward action sequence, but to sample a sequence of actions from the leading modes of the reward function Bengio et al. (2021a) . Unfortunately, currently GFlowNets cannot support multi-agent systems. A multi-agent system is a set of autonomous, interacting entities that share a typical environment, perceive through sensors and act in conjunction with actuators Busoniu et al. (2008) . Multi-agent reinforcement learning (MARL), especially cooperative MARL, are widely used in robotics teams, distributed control, resource management, data mining, etc Zhang et al. ( 2021 (2018) . In MARL, to address these challenges, a popular centralized training with decentralized execution (CTDE) Oliehoek et al. (2008) ; Oliehoek & Amato (2016) paradigm is proposed, in which the agent's policy is trained in a centralized manner by accessing global information and executed in a decentralized manner based only on local history. However, extending these techniques to GFlowNets is not straightforward, especially in constructing CTDE-architecture flow networks and finding IGM conditions for flow networks worth investigating. In this paper, we propose Generative Multi-Flow Networks (GMFlowNets) framework for cooperative decision-making tasks, which can generate more diverse patterns through sequential joint ac-



); Canese et al. (2021); Feriani & Hossain (2021). Two major challenges for cooperative MARL are scalability and partial observability Yang et al. (2019); Spaan (2012). Since the joint state-action space grows exponentially with the number of agents, coupled with the environment's partial observability and communication constraints, each agent needs to make individual decisions based on local action observation history with guaranteed performance Sunehag et al. (2017); Wang et al. (2020); Rashid et al.

