REPRESENTATION INTERFERENCE SUPPRESSION VIA NON-LINEAR VALUE FACTORIZATION FOR INDECOM-POSABLE MARKOV GAMES

Abstract

Value factorization is an efficient approach for centralized training with decentralized execution in cooperative multi-agent reinforcement learning tasks. As the simplest implementation of value factorization, Linear Value Factorization (LVF) attracts wide attention. In this paper, firstly, we investigate the applicable conditions of LVF, which is important but usually neglected by previous works. We prove that due to the representation limitation, LVF is only perfectly applicable to an extremely narrow class of tasks, which we define as the decomposable Markov game. Secondly, to handle the indecomposable Markov game where the LVF is inapplicable, we turn to value factorization with complete representation capability (CRC) and explore the general form of the value factorization function that satisfies both Independent Global Max (IGM) and CRC conditions. A common problem of these value factorization functions is the representation interference among true Q values with shared local Q value functions. As a result, the policy could be trapped in local optimums due to the representation interference on the optimal true Q values. Thirdly, to address the problem, we propose a novel value factorization method, namely Q Factorization with Representation Interference Suppression (QFRIS). QFRIS adaptively reduces the gradients of the local Q value functions contributed by the non-optimal true Q values. Our method is evaluated on various benchmarks. Experimental results demonstrate the good convergence of QFIRS.

1. INTRODUCTION

Centralized training with decentralized execution (CTDE) (Lowe et al., 2017; Oliehoek et al., 2008; Foerster et al., 2016) shows surprising performance and great scalability in challenging fully cooperative multi-agent reinforcement learning (MARL) tasks (Tan, 1993b) . Such tasks only provide rewards shared by all agents. Each agent is expected to deduce its own contribution to the team, which introduces the problem of credit assignment (Foerster et al., 2018) . As a simple and efficient approach for credit assignment in the CTDE paradigm, value factorization, especially Linear Value Factorization (LVF) recently gains growing attention, e.g., VDN (Sunehag et al., 2017) and QMIX (Rashid et al., 2018) . An important property of LVF is that it concisely meets the Independent Global Max (IGM) principle (Son et al., 2019) . The IGM principle is defined as the identity between the joint Q value function and the set of factorized local Q value functions, which is wildly acknowledged as a critical rule for value factorization. However, the linearly factorizable joint Q value function in LVF is incapable to represent non-linear true Q value functions, known as the representation limitation of LVF. Recent works focus on the solutions to the representation limitation but usually neglect under what conditions the true Q value function is not linearly factorizable. In this paper, we prove that in the context of Markov games, the linear factorizability relies on two conditions: (1) the reward function is linearly factorizable on a set of subspaces of the joint state-action space; (2) the state transitions in each subspace is irrelevant to the state and action out of the subspace. Based on the two conditions above, we define the decomposability of the Markov game. In words, the true Q value function is linearly factorizable if and only if the Markov game is decomposable. Most of the tasks are indecomposable Markov games, so we go deeper into the property of LVF in this case. We prove that the target of the joint Q

