OFFLINE CONGESTION GAMES: HOW FEEDBACK TYPE AFFECTS DATA COVERAGE REQUIREMENT

Abstract

This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.

1. INTRODUCTION

Congestion game is a special class of general-sum matrix games that models the interaction of players with shared facilities (Rosenthal, 1973) . Each player chooses some facilities to utilize, and each facility will incur a different reward depending on how congested it is. For instance, in the routing game (Koutsoupias & Papadimitriou, 1999) , each player decides a path to travel from the starting point to the destination point in a traffic graph. The facilities are the edges and the joint decision of all the players determines the congestion in the graph. The more players utilize one edge, the longer the travel time on that edge will be. As one of the most well-known classes of games, congestion game has been successfully deployed in numerous real-world applications such as resource allocation (Johari & Tsitsiklis, 2003) , electrical grids (Ibars et al., 2010) and cryptocurrency ecosystem (Altman et al., 2019) . Nash equilibrium (NE), one of the most important concepts in game theory (Nash Jr, 1950) , characterizes the emerging behavior in a multi-agent system with selfish players. It is commonly known that solving for the NE is computationally efficient in congestion games as they are isomorphic to potential games (Monderer & Shapley, 1996) . Assuming full information access, classic dynamics such as best response dynamics (Fanelli et al., 2008 ), replicator dynamics (Drighes et al., 2014) and no-regret dynamics (Kleinberg et al., 2009) provably converge to NE in congestion games. Recently Heliou et al. (2017) and Cui et al. (2022) relaxed the full information setting to the online (semi-) bandit feedback setting, achieving asymptotic and non-asymptotic convergence, respectively. It is worth noting that Cui et al. (2022) proposed the first algorithm that has sample complexity independent of the number of actions.

