DIFFERENTIABLE APPROXIMATIONS FOR MULTI-RESOURCE SPATIAL COVERAGE PROBLEMS

Abstract

Resource allocation for coverage of physical spaces is a challenging problem in robotic surveillance, mobile sensor networks and security domains. Recent gradient-based optimization approaches to this problem estimate utilities of actions by using neural networks to learn a differentiable approximation to spatial coverage objectives. In this work, we empirically show that spatial coverage objectives with multiple-resources are combinatorially hard to approximate for neural networks and lead to sub-optimal policies. As our major contribution, we propose a tractable framework to approximate a general class of spatial coverage objectives and their gradients using a combination of Newton-Leibniz theorem, spatial discretization and implicit boundary differentiation. We empirically demonstrate the efficacy of our proposed framework on single and multi-agent spatial coverage problems.

1. INTRODUCTION

Allocation of multiple resources for efficient spatial coverage is an important component of many practical single-agent and multi-agent systems, for e.g., robotic surveillance, mobile sensor networks and security game modeling. Surveillance tasks generally involve a single agent assigning resources e.g. drones or sensors, each of which can monitor physical areas, to various points in a target domain such that a loss function associated with coverage of the domain is minimized (Renzaglia et al., 2012) . Alternatively, security domains follow a leader-follower game setup between two agents, where a defender defends a set of targets (or a continuous target density in a geographical area) with limited resources to be placed, while an attacker plans an attack after observing the defender's placement strategy using its own resources (Tambe, 2011) . Traditional methods used to solve single-agent multi-resource surveillance problems often rely on potential fields (Howard et al., 2002) , discretization based approaches (Kong et al., 2006 ), voronoi tessellations (Dirafzoon et al., 2011) and particle swarm optimization (Nazif et al., 2010; Saska et al., 2014) . Similarly, many exact and approximate approaches have been proposed to maximize the defender's expected utility in two-agent multi-resource security domains against a best responding attacker (Kiekintveld et al., 2009; Amin et al., 2016; Yang et al., 2014; Haskell et al., 2014; Johnson et al., 2012; Huang et al., 2020) . Notably, most existing traditional approaches focus on exploiting some specific spatio-temporal or symmetry structure of the domain being examined. Related Work: Since spatial coverage problems feature continuous action spaces, a common technique used across most previous works is to discretize the area to be covered into grid cells and restrict the agents' actions to discrete sets (Kong et al., 2006; Yang et al., 2014; Haskell et al., 2014; Gan et al., 2017) to find the equilibrium mixed strategies or optimal pure strategies using integer linear programming. However, discretization quickly becomes intractable when the number of each agent's resources grows large. While some games can be characterized by succinct agent strategies and can be solved efficiently via mathematical programming after discretizing the agents' actions spaces (Behnezhad et al., 2018) , this is not true for most multi-resource games. Recent works in spatial coverage domains have focused on incorporating advances from deep learning to solve the coverage problems with more general algorithms. For instance, Pham et al. ( 2018) focus on the multi-UAV coverage of a field of interest using a model-free multi-agent RL method while StackGrad (Amin et al., 2016 ), OptGradFP (Kamra et al., 2018) , PSRO (Lanctot et al., 2017) are model-free fictitious play based algorithms which can be used to solve games in continuous action spaces. However model-free approaches are sample inefficient and require many interactions with the domain (or with a simulator) to infer expected utilities of agents' actions. Secondly, they often rely on the policy gradients to compute the derivative of the agents' expected utilities w.r.t. their mixed strategies, which induces a high variance in the estimate. To alleviate these issues, more recent works take an actor-critic based approach (Lowe et al., 2017) , which additionally learns a differentiable approximation to the agents' utilities (Kamra et al., 2019a; Wang et al., 2019) and calculate gradients of strategies w.r.t. the utilities. But this requires learning accurate reward/value functions which becomes combinatorially hard for multi-resource coverage. Contributions: To address the above challenge, we present a framework to tractably approximate a general class of spatial coverage objectives and their gradients via spatial discretization without having to learn neural network based reward models. We only discretize the target domain to represent integrals and all set operations over it, but not the action spaces of the agents. Hence we mitigate the intractability caused by discretizing high dimensional action spaces of agents with large number of resources, while also keeping agents' actions amenable to gradient-based optimization. By combining our framework with existing solution methods, we successfully solve both single-agent and adversarial two-agent multi-resource spatial coverage problems.

2. MULTI-RESOURCE SPATIAL COVERAGE PROBLEMS

In this section, we formally introduce notation and definitions for multi-resource allocation problems along with two example applications, which will be used for evaluation. Multi-agent multi-resource spatial coverage: Spatial coverage problems comprise of a target space Q ⊂ R d (generally d ∈ {2, 3}) and a set of agents (or players) P with each agent p ∈ P having m p resources. We will use the notation -p to denote all agents except p i.e. P \{p}. Actions: An action u p ∈ R mp×dp for agent p is the placement of all its resources in an appropriate coordinate system of dimension d p . Let U p denote the compact, continuous and convex action set of agent p. Mixed strategies: We represent a mixed strategy i.e. the probability density of agent p over its action set U p as σ p (u p ) ≥ 0 s.t. Up σ p (u p )du p = 1. We denote agent p sampling an action u p ∈ U p from his mixed strategy density as u p ∼ σ p . Joints: Joint actions, action sets and densities for all agents together are represented as u = {u p } p∈P , U = × p∈P {U p } and σ = {σ p } p∈P respectively. Coverage: When placed, each resource covers (often probabilistically) some part of the target space Q. Let cvg p : q × u → R be a function denoting the utility for agent p coming from a target point q ∈ Q due to a joint action u for all agents. We do not assume a specific form for the coverage utility cvg p and leave it to be defined flexibly, to allow many different coverage applications to be amenable to our framework. Rewards: Due to the joint action u, each player achieves a coverage reward r p : u → R of the form r p (u) = Q cvg p (q, u) imp p (q) dq, where imp p (q) denotes the importance of the target point q for agent p. With a joint mixed strategy σ, player p achieves expected utility: E u∼σ [r p ] = U r p (u)σ(u)du. Objectives: In single-agent settings, the agent would directly optimize his expected utility w.r.t. action u p . But in multi-agent settings, the expected utilities of agents depend on other agents' actions and hence cannot be maximized with a deterministic resource allocation due to potential exploitation by other agents. Instead agents aim to achieve Nash equilibrium mixed strategies σ = {σ p } p∈P over their action spaces. Nash equilibria: A joint mixed strategy σ * = {σ * p } p∈P is said to be a Nash equilibrium if no agent can increase its expected utility by changing its strategy while the other agents stick to their current strategy. Two-player settings: While our proposed framework is not restricted to the number of agents or utility structure of the game, we will focus on single-player settings and zero-sum two-player games in subsequent examples. An additional concept required by fictitious play in two-player settings is that of a best response. A best response of agent p against strategy σ -p is an action which maximizes his expected utility against σ -p : Notably, a Nash equilibrium mixed strategy for each player is also their least exploitable strategy. Example 1 (Single-agent Areal Surveillance). A single agent, namely the defender (D), allocates m areal drones with the i th drone D i having three-dimensional coordinates u D,i = (p D,i , h D,i ) ∈



br p (σ -p ) ∈ arg max up E u-p∼σ-p [r p (u p , u -p )] .The expected utility of any best response of agent p is called the exploitability of agent -p:-p (σ -p ) := max up E u-p∼σ-p [r p (u p , u -p )] .

