LPMARL: LINEAR PROGRAMMING-BASED IMPLICIT TASK ASSIGNMENT FOR HIERARCHICAL MULTI-AGENT REINFORCEMENT LEARNING

Abstract

Training a multi-agent reinforcement learning (MARL) model with sparse reward is notoriously difficult as the terminal reward is induced by numerous interactions among agents. In this study, we propose linear programming (LP)-based hierarchical MARL (LPMARL) to learn effective cooperative strategies among agents. LPMARL is composed of two hierarchical decision-making schemes: (1) solving an agent-task assignment LP using the state-dependent cost parameters generated by a graph neural network (GNN) and ( 2) solving low-level cooperative games among agents assigned to the same task. We train the LP parameter-generating GNN and the low-level MARL policy in an end-to-end manner using the implicit function theorem. We empirically demonstrate that LPMARL learns an optimal agent-task allocation and the subsequent local cooperative policy for agents in sub-groups for solving various mixed cooperative-competitive games.

1. INTRODUCTION

Multi-agent reinforcement learning (MARL) has recently drawn much attention due to its practical and potential applications in controlling complicated and distributed multi-agent systems. Despite its potential, training an MARL model with sparse reward is notoriously difficult as the final sparse reward is induced by the complex long-term interactions among the agents (Liu et al., 2021) . To overcome this challenge, one needs to develop an algorithm that can learn how the interactions among the agents over long-term episodes entail the outcome of the target tasks, a delayed and sparse episodic reward, and deduce this understanding into an effective sequential decision-making policy. In this study, we propose a linear programming-based hierarchical MARL (LPMARL), a hierarchically structured decision-making scheme, to learn an effective coordination strategy among the agents. LPMARL conducts two hierarchical decision-making: (1) solving an agent-task assignment problem and (2) solving local cooperative games among agents that are assigned to the same task. For the first step, LPMARL formulates the agent-task assignment as an LP by using the state-dependent cost coefficients generated by a graph neural network (GNN). The solution of the formulated LP serves as an agent-to-task assignment, which decomposes the original team game into a set of smaller team games among the agents that are assigned to the same task. LPMARL then employs a general MARL strategy to solve each sub-task cooperatively in the second step. We train the LP-parameter-generating GNN layer and the low-level MARL policy network in an endto-end manner using the implicit function theorem. We validate the effectiveness of LPMARL using various cooperative games with constrained resource allocation. The technical contributions and novelties of the proposed method are as follows: • Interpretability (Section 6.2.1). LPMARL can induce the designed behavior of the agents (i.e., behavior inductive biases) by using specific objective terms or constraints when formulating the LP. This structured framework helps one to interpret the decision-making procedure. • Transferability (Section 6.2.2). LPMARL learns to construct and solve the task assignment optimization problems. When constructing a resource assignment LP problem, LP-

