A UNIFIED APPROACH TO REINFORCEMENT LEARNING, QUANTAL RESPONSE EQUILIBRIA, AND TWO-PLAYER ZERO-SUM GAMES

Abstract

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm. 1 We use "standard RL algorithm" to mean algorithms that would look ordinary to single-agent RL practitioners-excluding, e.g., algorithms that converge in the average iterate or operate over sequence form. 2 Note that 2p0s games generalize single-agent settings, such as Markov decision processes (Puterman, 2014) and partially observable Markov decision processes (Kaelbling et al., 1998) . 3 Specifically, it is a logit QRE; We omit "logit" as a prefix for brevity.

1. INTRODUCTION

This work studies an algorithm that we call magnetic mirror descent (MMD) in the context of twoplayer zero-sum games. MMD is an extension of mirror descent (Beck & Teboulle, 2003; Nemirovsky & Yudin, 1983) with proximal regularization and a special case of a non-Euclidean proximal gradient method (Tseng, 2010; Beck, 2017) -both of which have been studied extensively in convex optimization. To facilitate our analysis of MMD, we extend the non-Euclidean proximal gradient method from convex optimization to 2p0s games and variational inequality problems (Facchinei & Pang, 2003) more generally. We then prove a new linear convergence result for the non-Euclidean proximal gradient method in variational inequality problems with composite structure. As a consequence of our general analysis, we attain formal guarantees for MMD by showing that solving for quantal response equilibria (McKelvey & Palfrey, 1995) (i.e., entropy regularized Nash equilibria) in extensive-form games (EFGs) can be modeled as variational inequality problems via the sequence form (Romanovskii, 1962; Von Stengel, 1996; Koller et al., 1996) . These guarantees provide the first linear convergence results to quantal response equilibria (QREs) in EFGs for a first order method. Our empirical contribution investigates MMD as a last iterate (regularized) equilibrium approximation algorithm across a variety of 2p0s benchmarks. We begin by confirming our theory-showing that MMD converges exponentially fast to QREs in both NFGs and EFGs. We also find that, empirically, MMD converges to agent QREs (AQREs) (McKelvey & Palfrey, 1998 )-an alternative formulation of QREs for extensive-form games-when applied with action-value feedback. These results lead us to examine MMD as an RL algorithm for approximating Nash equilibria. On this front, we show ˚Equal contribution competitive performance with counterfactual regret minimization (CFR) (Zinkevich et al., 2007) . This is the first instance of a standard RL algorithm 1 yielding empirically competitive performance with CFR in tabular benchmarks when applied in self play. Motivated by our tabular results, we examine MMD as a multi-agent deep RL algorithm for 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe-encouragingly, we find that MMD is able to successfully minimize an approximation of exploitability. In addition to those listed above, we also provide numerous other experiments in the appendix. In aggregate, we believe that our results suggest that MMD is a unifying approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games.

2. BACKGROUND

Sections 2.1 and 3.3 provide a casual treatment of our problem settings and solution concepts and a summary of our algorithm and some of our theoretical results. Sections 2.2 through 3.2 give a more formal and detailed treatment of the same material-these sections are self-contained and safe-to-skip for readers less interested in our theoretical results.

2.1. PROBLEM SETTINGS AND SOLUTION CONCEPTS

This work is concerned with 2p0s games-i.e., settings with two players in which the reward for one player is the negation of the reward for the other player. 2 Two-player zero-sum games are often formalized as NFGs, partially observable stochastic games (Hansen et al., 2004) or a perfect-recall EFGs (von Neumann & Morgenstern, 1947) . An important idea is that it is possible to convert any EFG into an equivalent NFG. The actions of the equivalent NFG correspond to the deterministic policies of the EFG. The payoffs for a joint action are dictated by the expected returns of the corresponding joint policy in the EFG. We introduce the solution concepts studied in this work as generalizations of single-agent solution concepts. In single-agent settings, we call these concepts optimal policies and soft-optimal policies. We say a policy is optimal if there does not exist another policy achieving a greater expected return (Sutton & Barto, 2018) . In problems with a single decision-point, we say a policy is α-soft optimal in the normal sense if it maximizes a weighted combination of its expected action value and its entropy: π " arg max π 1 P∆pAq E A"π 1 qpAq `αHpπ 1 q, (1 ) where π is a policy, ∆pAq is the action simplex, q is the action-value function, α is the regularization temperature, and H is Shannon entropy. More generally, we say a policy is α-soft optimal in the behavioral sense if it satisfies equation (1) at every decision point. In 2p0s settings, we refer to the solution concepts used in this work as Nash equilibria and QREs. We say a joint policy is a Nash equilibrium if each player's policy is optimal, conditioned on the other player not changing its policy. In games with a single-decision point, we say a joint policy is a QRE 3 (McKelvey & Palfrey, 1995) if each player's policy is soft optimal in the normal sense, conditioned on the other player not changing its policy. More generally, we say a joint policy is an agent QRE (AQRE) (McKelvey & Palfrey, 1998) if each player's policy is soft optimal in the behavioral sense, subject to the opponent's policy being fixed. Note that AQREs of EFGs do not generally correspond with the QREs of their normal-form equivalents. Outside of (A)QREs, our results also apply to other regularized solution concepts, such as those having KL regularization toward a non-uniform policy.

2.2. NOTATION

We use superscript to denote a particular coordinate of x " px 1 , ¨¨¨, x n q P R n and subscript to denote time x t . We use the standard inner product denoted as xx, yy " ř n i"1 x i y i . For a given norm } ¨} on R n we define its dual norm }y} ˚" sup }x}"1 xy, xy. For example, the dual norm to }x} 1 " ř n i"1 |x i | is }x} 8 " max i |x i |. We assume all functions f : R n Ñ p´8, `8s to be closed, with domain of f as dom f " tx : f pxq ă `8u and corresponding interior int dom f . If f is convex and differentiable, then its minimum x ˚P arg min xPC f pxq over a closed convex set C satisfies x∇f px ˚q, x ´x˚y ě 0 for any x P C. We use the Bregman divergence of ψ to generalize the notion of distance. Let ψ be a convex function differentiable over int dom ψ. Then the Bregman divergence with respect to ψ is B ψ : dom ψ înt dom ψ Ñ R, defined as B ψ px; yq " ψpxq ´ψpyq ´x∇ψpyq, x ´yy. We say that f is µ-strongly convex over C with respect to } ¨} if B f px; yq ě µ 2 }x ´y} 2 for any x P C, y P C X int dom ψ. Similarly we define relative strong convexity (Lu et al., 2018) . We say g is µ-strongly convex relative to ψ over C if x∇gpxq ´∇gpyq, x ´yy ě µx∇ψpxq ´∇ψpyq, x ´yy or, equivalently, if B g px; yq ě µB ψ px; yq, @x, y P int dom ψ X C (Lu et al., 2018) . Note both ψ and B ψ p¨; yq are 1-strongly convex relative to ψ.

2.3. ZERO-SUM GAMES AND QRES

In 2p0s games, the solution of a QRE can be written as the solution to a negative entropy regularized saddle point problem. To model QREs (and more), we consider the regularized min max problem min xPX max yPY αg 1 pxq `f px, yq ´αg 2 pyq, where X Ă R n , Y Ă R m are closed and convex (and possibly unbounded) and g 1 : R n Ñ R, g 2 : R m Ñ R, f : R n ˆRm Ñ R. Moreover, g 1 and f p¨, yq are differentiable and convex for every y. Similarly ´g2 , f px, ¨q are differentiable and concave for every x. A solution px ˚, y ˚q to equation 2 is a Nash equilibrium in the regularized game with the following best response conditions along with their equivalent first order optimality conditions x ˚P arg min xPX αg 1 pxq `f px, y ˚q ô xα∇g 1 px ˚q `∇x˚f px ˚, y ˚q, x ´x˚y ě 0 @x P X , y ˚P arg min yPY αg 2 pyq ´f px ˚, yq ô xα∇g 2 py ˚q ´∇y˚f px ˚, y ˚q, y ´y˚y ě 0 @y P Y. (4) In the context of QREs we have that X " ∆ n , Y " ∆ m with f px, yq " x J Ay for some payoff matrix A, and g 1 , g 2 are negative entropy. The corresponding best response conditions (3-4) can be written in closed form as x ˚9 expp ´Ay˚{ αq, y ˚9 expp A J x˚{ αq. Similarly, for EFGs, normal-form QREs take the form of equation 2 (Ling et al., 2018) with g 1 , g 2 being dilated entropy (Hoda et al., 2010) , f px, yq " x J Ay (A being the sequence-from payoff matrix), and X , Y the sequence-form strategy spaces of both players.

2.4. CONNECTION BETWEEN ZERO-SUM GAMES AND VARIATIONAL INEQUALITIES

More generally, solutions to equation 2 (including QREs) can be written as solutions to variational inequalities (VIs) with specific structure. The equivalent VI formulation stacks both first-order best response conditions (3-4) into one inequality. Definition 2.1 (Variational Inequality Problem (VI)). Given Z Ď R n and mapping G : Z Ñ R n , the variational inequality problem VIpZ, Gq is to find z ˚P Z such that xGpz ˚q, z ´z˚y ě 0 @z P Z. (5) In particular, the optimality conditions (3-4) are equivalent to VIpZ, Gq where G " F `α∇g, Z " X ˆY and g : Z Ñ R, px, yq Þ Ñ g 1 pxq `g2 pyq, with corresponding operators F pzq " r∇ x f px, yq, ´∇y f px, yqs J , and ∇g " r∇ x g 1 pxq, ∇ y g 2 pyqs J . For more details see Facchinei & Pang (2003) (Section 1.4.2). Note that VIs are more general than min-max problems; they also include fixed-point problems and Nash equilibria in n-player general-sum games (Facchinei & Pang, 2003) . However, in the case of convex-concave zero-sum games and convex optimization, the problem admits efficient algorithms since the corresponding operator G is monotone (Rockafellar, 1970) . Definition 2.2. G is said to be strongly monotone if, for µ ą 0 and any z, z 1 where G is defined, xGpzq ´Gpz 1 q, z ´z1 y ě µ}z ´z1 } 2 . G is monotone if this is true for µ " 0. Definition 2.3. G is said to be L-smooth with respect to } ¨} if, for any z, z 1 where G is defined, }Gpzq ´Gpz 1 q} ˚ď L}z ´z1 }. For EFGs, Ling et al. (2018) showed that the QRE is the solution of a min-max problem of the form equation 2 where f is bilinear and each g i could be non smooth. Therefore, we can write the problem as a VI with strongly monotone operator G having composite structure, a smooth part coming from f and non-smooth part from the regularization g 1 , g 2 . Proposition 2.4. Solving a normal-form reduced QRE in a two-player zero-sum EFG is equivalent to solving VIpZ, F `α∇ψq where Z is the cross-product of the sequence form strategy spaces and ψ is the sum of the dilated entropy functions for each player. The function ψ is strongly convex with respect to } ¨}. Furthermore, F is monotone and max ij |A ij |-smooth (A being the sequence-form payoff matrix) with respect to } ¨} and F `α∇ψ is strongly monotone.

3. ALGORITHMS AND THEORY

In Proposition 2.4, we provided a new perspective to QRE problems that draws connections to VIs with special composite structure. Motivated by this connection, in Section 3.1, we consider an approach to solve such problems via a non-Euclidean proximal gradient method Tseng (2010) ; Beck (2017) and prove a novel linear convergence result. Thereafter, in Section 3.2, we demonstrate how this general algorithm specializes to MMD and splits into two decentralized simultaneous updates in 2p0s games (one for each player). Finally, in Section 3.3, we discuss specific instances of MMD, give new algorithms for RL and QRE solving, and summarize our linear convergence result for QREs.

3.1. CONVERGENCE ANALYSIS

We now present our main algorithm, a non-Euclidean proximal gradient method to solve VIpZ, F `α∇gq. Since ∇g is possibly not smooth, we incorporate g as a proximal regularization. Algorithm 3.1. Starting with z 1 P int dom ψ X Z at each iteration t do z t`1 " arg min zPZ η pxF pz t q, zy `αgpzqq `Bψ pz; z t q. To ensure that z t`1 is well defined, we make the following assumption. Assumption 3.2 (Well-defined). Assume ψ is 1-strongly convex with respect to } ¨} over Z and, for any ℓ, stepsize η ą 0, α ą 0, z t`1 " arg min zPZ η pxℓ, zy `αgpzqq `Bψ pz; z t q P int dom ψ. We also make some assumptions on F and g. Assumption 3.3. Let F be monotone and L-smooth with respect to } ¨} and g be 1-strongly convex relative to ψ over Z with g differentiable over int dom ψ. These assumptions imply F `α∇g is strongly monotonefoot_0 with unique solution z ˚(Bauschke et al., 2011). Our result shows that, if z ˚P int dom ψfoot_1 , then Algorithm 3.1 converges linearly to z ˚. Theorem 3.4. Let Assumptions 3.2 and 3.3 hold and assume the unique solution z ˚to VIpZ, F ὰ∇gq satisfies z ˚P int dom ψ. Then Algorithm 3.1 converges if η ď α L 2 and guarantees B ψ pz ˚; z t`1 q ď ˆ1 1 `ηα ˙t B ψ pz ˚; z 1 q. Note α ą 0 is necessary to converge to the solution. If α " 0 in the context of solving equation 2, Algorithm 3.1 with ψpzq " 1 2 }z} 2 becomes projected gradient descent ascent, which is known to diverge or cycle for any positive stepsize. However, choosing the strong convexity constants of g and ψ to be 1 is for convenience-the theorem still holds with arbitrary constants, in which case the stepsize condition becomes proportional to the relative strong convexity constant of g (see Corollary D.6 for details). Due to the generality of VIs, we have the following convex optimization result. Corollary 3.5. Consider the composite optimization problem min zPZ f pzq `αgpzq. Then under the same assumptions as Theorem 3.4 with F " ∇f , Algorithm 3.1 converges linearly to the solution. Note that Corollary 3.5 guarantees linear convergence, which is faster than existing results (Tseng, 2010; Bauschke et al., 2017; Hanzely et al., 2021) , due to the additional assumption that g is relatively-strongly convex.

3.2. APPLICATION OF MAGNETIC MIRROR DESCENT TO TWO-PLAYER ZERO-SUM GAMES

We define MMD to be Algorithm 3.1 with g taken to be either ψ or B ψ p¨; z 1 q for some z 1 ; in both cases the 1-relative strongly convex assumption is satisfied, and z t`1 is attracted to either min zPZ ψpzq or z 1 , which we call the magnet. Algorithm 3.6 (Magnetic Mirror Descent (MMD)). z t`1 " arg min zPZ η pxF pz t q, zy `αψpzqq `Bψ pz; z t q (6) or z t`1 " arg min zPZ η `xF pz t q, zy `αB ψ pz; z 1 q ˘`B ψ pz; z t q. (7) Remark 3.7. MMD has the same computational cost as mirror descent since the updates can be equivalently written as z t`1 " arg min zPZ xℓ, zy `ψpzq (e.g. ℓ" pηF pztq´∇ψpxtqq {p1`ηαq for equation 6). In fact, Proposition D.7 shows that MMD is equivalent to mirror descent on the regularized loss with a different stepsize. MMD and, more generally, Algorithm 3.1 can be used to derive a descent-ascent method to solve the zero-sum game equation 2. If g 1 " ψ 1 and g 2 " ψ 2 are strongly convex over X and Y, then we can let ψpzq " ψ 1 pxq `ψ2 pyq, which makes ψ strongly convex over Z. Then the MMD update rule equation 6 converges to the solution of equation 2 and splits into simultaneous descent-ascent updates: x t`1 " arg min xPX η px∇ xt f px t , y t q, xy `αψ 1 pxqq `Bψ1 px; x t q, (8) y t`1 " arg max yPY η px∇ yt f px t , y t q, yy ´αψ 2 pyqq ´Bψ2 py; y t q. (9)

3.3. MAGNET MIRROR DESCENT SUMMARY

MMD's update is parameterized by four objects: a stepsize η, a regularization temperature α, a mirror map ψ, and a magnet, which we denote as either ρ or ζ depending on the ψ. The stepsize η dictates the extent to which moving away from the current iterate is penalized; the regularization temperature α dictates the extent to which being far away from the magnet (i.e., ρ or ζ) is penalized; the mirror map ψ determines how distance is measured. If we take ψ to be negative entropy, then, in reinforcement learning language, MMD takes the form π t`1 " argmax π E A"π q t pAq ´αKLpπ, ρq ´1 η KLpπ, π t q, (10) where π t is the current policy, q t is the Q-value vector for time t, and ρ is a magnet policy. For parameterized problems, if ψ " 1 2 } ¨}2 2 , MMD takes the form θ t`1 " argmin θ x∇ θt Lpθ t q, θy `α 2 }θ ´ζ} 2 2 `1 2η }θ ´θt } 2 2 , ( ) where θ t is the current parameter vector, L is the loss, and ζ is the magnet. In settings with discrete actions and unconstrained domains, respectively, these instances of MMD possess close forms, as shown below π t`1 9rπ t ρ αη e ηqt s 1 1`αη , θ t`1 " rθ t `αηζ ´η∇ θt Lpθ t qs 1 1 `αη . Our main result, Theorem 3.4, and Proposition 2.4 imply that if both players simultaneously update their policies using equation ( 10) with a uniform magnet in 2p0s NFGs, then their joint policy converges to the α-QRE exponentially fast. Similarly, in EFGs, if both players use a type of policy called sequence form with ψ taken to be dilated entropy, then their joint policy converges to the α-QRE exponentially fast. Both of these results also hold more generally for equilibria induced by non-uniform magnet policies. MMD can also be a considered as a behavioral-form algorithm in which update rule (10) or ( 11) is applied at each information state. If ρ is uniform, a fixed point of this instantiation is an α-AQRE; more generally, fixed points are regularized equilibria (i.e., fixed points of a regularized best response operator). < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c < l a t e x i t s h a 1 _ b a s e 6 4 = " Y U < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 U 3 u / c p F 4  M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y U N P u F g w U 0 R H Q I Q I a z 9 L X k L v p r 0 = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 L w 6 6 Z X K b s W d g S w T L y d l y F H v l b 6 6 / Z i l E U r D B N W 6 4 7 m J 8 T O q D G c C J 8 V u q j G h b E Q H 2 L F U 0 g i 1 n 8 3 O n Z B T q / R J G C t b 0 p C Z + n s i o 5 H W 4 y i w n R E 1 Q 7 3 o T c X / v E 5 q w m s / 4 z J J D U o 2 X x S m g p i Y T H 8 n f a 6 Q G T G 2 h D L F 7 a 2 E D a m i z N i E i j Y E b / H l Z d K 8 q H j V y u V 9 t V c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r C X o M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S r l 6 X y n V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R B W O 4 Q = = < / l a t e x i t > 10 6 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 U 3 u / c p F 4 9 k d E c s v j t g L v p f r Z V 4 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r 8 X E M e v E Y w T w g W c P s p J M M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l h w b V z 3 2 8 m t r K 6 t b + Q 3 C 1 v b O 7 t 7 x f 2 D h o 4 S x b D O I h G p V k A 1 C i 6 x b r g R 2 I o V 0 j A Q 2 A x G t 1 O / + Y R K 8 0 g + m H G M f k g H k v c 5 o 8 Z K T c 9 9 T M 8 u J 9 1 i y S 2 7 M 5 B l 4 m W k B B l q 3 e J X p x e x J E R p m K B a t z 0 3 N n 5 K l e F M 4 K T Q S T T G l I 3 o A N u W S h q i 9 t P Z u R N y Y p U e 6 U f K l j R k p v 6 e S G m o 9 T g M b G d I z V A v e l P x P 6 + d m P 6 1 n 3 I Z J w Y l m y / q J 4 K Y i E x / J z 2 u k B k x t o Q y x e 2 t h A 2 p o s z Y h A o 2 B G / x 5 W X S O C 9 7 l f L F f a V U v c n i y M M R H M M p e H A F V b i D G t S B w Q i e 4 R X e n N h 5 c d 6 d j 3 l r z s l m D u E P n M 8 f R Z q O 4 g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " j S F U O 5 C l O 4 n 3 y R f / L x u F c 2 Y N h r A = " > A A A B 7 X i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m I o s e i F 4 8 V 7 A e 0 a 8 m m 2 T Y 2 m y x J V i h L / 4 M X D 4 p 4 9 f 9 4 8 9 + Y t n v Q 1 g c D j / d m m J k X J o I b 6 / v f X m F l d W 1 9 o 7 h Z 2 t r e 2 d 0 r 7 x 8 0 j U o 1 Z Q 2 q h N L t k B g m u G Q N y 6 1 g 7 U Q z E o e C t c L R z d R v P T F t u J L 3 d p y w I C Y D y S N O i X V S E / s P G Z 7 0 y h W / 6 s + A l g n O S Q V y 1 H v l r 2 5 f 0 T R m 0 l J B j O l g P 7 F B R r T l V L B J q Z s a l h A 6 I g P W c V S S m J k g m 1 0 7 Q S d O 6 a N I a V f S o p n 6 e y I j s T H j O H S d M b F D s + h N x f + 8 T m q j q y D j M k k t k 3 S + K E o F s g p N X 0 d 9 r h m 1 Y u w I o Z q 7 W x E d E k 2 o d Q G V X A h 4 8 e V l 0 j y r 4 v P q x d 1 5 p X a d x 1 G E I z i G U 8 B w C T W 4 h T o 0 g M I j P M M r v H n K e / v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X N P u F g w U 0 R H Q I Q I a z 9 L X k L v p r 0 = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 L w 6 6 Z X K b s W d g S w T L y d l y F H v l b 6 6 / Z i l E U r D B N W 6 4 7 m J 8 T O q D G c C J 8 V u q j G h b E Q H 2 L F U 0 g i 1 n 8 3 O n Z B T q / R J G C t b 0 p C Z + n s i o 5 H W 4 y i w n R E 1 Q 7 3 o T c X / v E 5 q w m s / 4 z J J D U o 2 X x S m g p i Y T H 8 n f a 6 Q G T G 2 h D L F 7 a 2 E D a m i z N i E i j Y E b / H l Z d K 8 q H j V y u V 9 t V 9 k d E c s v j t g L v p f r Z V 4 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r 8 X E M e v E Y w T w g W c P s p J M M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l h w b V z 3 2 8 m t r K 6 t b + Q 3 C 1 v b O 7 t 7 x f 2 D h o 4 S x b D O I h G p V k A 1 C i 6 x b r g R 2 I o V 0 j A Q 2 A x G t 1 O / + Y R K 8 0 g + m H G M f k g H k v c 5 o 8 Z K T c 9 9 T M 8 u J 9 1 i y S 2 7 M 5 B l 4 m W k B B l q 3 e J X p x e x J E R p m K B a t z 0 3 N n 5 K l e F M 4 K T Q S T T G l I 3 o A N u W S h q i 9 t P Z u R N y Y p U e 6 U f K l j R k p v 6 e S G m o 9 T g M b G d I z V A v e l P x P 6 + d m P 6 1 n 3 I Z J w Y l m y / q J 4 K Y i E x / J z 2 u k B k x t o Q y x e 2 t h A 2 p o s z Y h A o 2 B G / x 5 W X S O C 9 7 l f L F f a V U v c n i y M M R H M M p e H A F V b i D G t S B w Q i e 4 R X e n N h 5 c d 6 d j 3 l r z s l m D u E P n M 8 f R Z q O 4 g = = < / l a t e x i t > 10 8 < l a t e x i t s h a 1 _ b a s e 6 4 = " F h t o r a X X O U R o o W 7 5 u x 8 C S E P f / N k = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r E X M M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m K D a S 7 m M E 4 O S L R Y F i S A m I r P f y Y A r Z E Z M L K F M c X s r Y S O q K D M 2 o Y I N w V t + e Z U 0 L 8 t e p X x 1 X y n V b r I 4 8 n A C p 3 A O H l x D D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H S K S O 5 A = = < / l a t e x i t > 10 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " h K B m B J C w E 7 x C X x Z w m E S n t + g X 6 O M = " > A A A B 7 3 i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r E T 0 G v X i M Y B 6 Q r G F 2 M k m G z M 6 u M 7 1 C W P I T X j w o 4 t X f 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y F Q d f 9 d n I r q 2 v r G / n N w t b 2 z u 5 e c f + g Y a J E M 1 5 n k Y x 0 K 6 C G S 6 F 4 H Q V K 3 o o 1 p 2 E g e T M Y 3 U z 9 5 h P X R k T q H s c x 9 0 M 6 U K I v G E U r t T z 3 I T 3 z 3 E m 3 W H L L 7 g x k m X g Z K U G G W r f 4 1 e l F L A m 5 Q i a p M W 3 P j d F P q U b B J J 8 U O o n h M W U j O u B t S x U N u f H T 2 b 0 T c m K V H u l H 2 p Z C M l N / T 6 Q 0 N G Y c B r Y z p D g 0 i 9 5 U / M 9 r J 9 i / 8 l O h 4 g S 5 Y v N F / U Q S j M j 0 e d I T m j O U Y 0 s o 0 8 L e S t i Q a s r Q R l S w I X i L L y + T x n n Z q 5 Q v 7 i q l 6 n U W R x 6 O 4 B h O w Y N L q M I t 1 K A O D C Q 8 w y u 8 O Y / O i / P u f M x b c 0 4 2 c w h / 4 H z + A K 1 5 j x c = < / l a t e x i t >

4. EXPERIMENTS

Our main body focuses on highlighting the high level takeaways of our main experiments. Additional discussion of each experiment, as well as additional experiments, are included in the appendix. Code for the sequence form experiments is available at https://github.com/ryan-dorazio/ mmd-dilated. Code for some of the other experiments is available at https://github.com/ ssokota/mmd. Experimental Domains For tabular normal-form settings, we used stage games of a 2p0s Markov variant of the game Diplomacy (Paquette et al., 2019) . These games have payoff matrices of shape p50, 50q, p35, 43q, p50, 50q, and p4, 4q, respectively, and were constructed using an open-source value function (Bakhtin et al., 2021) . For tabular extensive-form settings, we used games implemented in OpenSpiel (Lanctot et al., 2019)  8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R h A = " > A A A B 9 H i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V i 1 6 E o h e P F e w H t E u Z T b N t a D a 7 J t l C K f 0 d X j w o 4 t U f 4 8 1 / Y 9 r u Q V s f D P N 4 b 4 Z M X p A I r o 3 r f j u 5 t f W N z a 3 8 d m F n d 2 / / o H h 4 1 N B x q i i r 0 1 j E q h W g Z o J L V j f c C N Z K F M M o E K w Z D O 9 m f n P E l O a x f D T j h P k R 9 i U P O U V j J b + D I h k g u S F u 2 a 1 0 i y X b 5 i C r x M t I C T L U u s W v T i + m a c S k o Q K 1 b n t u Y v w J K s O p Y N N C J 9 U s Q T r E P m t b K j F i 2 p / M j 5 6 S M 6 v 0 S B g r W 9 K Q u f p 7 Y 4 K R 1 u M o s J M R m o F z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f Z c v 1 + t e a 4 3 B 1 4 l f k F q q E C z X / 0 K B i n J E i o M 4 a B 1 z / e k C X N Q h h F O p 5 U g 0 1 Q C G c O Q 9 i w V k F A d 5 v O b p / j M K g M c p 8 q W M H i u / p 7 I I d F 6 k k S 2 M w E z 0 s v e T P z P 6 2 U m v g 5 z J m R m q C C L R X H G s U n x L A A 8 Y I o S w y e W A F H M 3 o r J C B Q Q Y 2 O q 2 B D 8 5 Z d X S f v C 9 e v u 5 U O 9 1 r g t 4 i i j E 3 S K z p G P r l A D 3 a M m a i G C J H p G r + j N y Z w X 5 9 3 5 W L S W n G L m G P 2 B 8 / k D E v a Q a A = = < / l a t e x i t > ↵ = 0.2 < l a t e x i t s h a 1 _ b a s e 6 4 = " X G s a T e p s G G l a u S t O K t u L Y c K w W v 8 = " > A A A B 8 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 h K R S 9 C 0 Y v H C r Y W m l A 2 2 0 2 7 d L N Z d j d C C f 0 b X j w o 4 t U / 4 8 1 / 4 7 b N Q V s f D D z e m 2 F m X i Q 5 0 8 b z v p 3 S 2 v r G 5 l Z 5 u 7 K z u 7 d / U D 0 8 6 u g 0 U 4 S 2 S c p T 1 Y 2 w p p w J 2 j b M c N q V i u I k 4 v Q x G t / O / M c n q j R L x Y O Z S B o m e C h Y z A g 2 V g o C z O U I o 2 v k u f V + t e a 5 3 h x o l f g F q U G B V r / 6 F Q x S k i V U G M K x 1 j 3 f k y b M s T K M c D q t B J m m E p M x H t K e p Q I n V I f 5 / O Y p O r P K A M W p s i U M m q u / J 3 K c a D 1 J I t u Z Y D P S y 9 5 M / M / r Z S a + C n M m Z G a o I I t F c c a R S d E s A D R g i h L D J 5 Z g o p i 9 F Z E R V p g Y G 1 P F h u A v v 7 x K O n X X b 7 g X 9 4 1 a 8 6 a I o w w n c A r n 4 M M l N O E O W t A G A h K e 4 R X e n M x 5 c d 6 d j 0 V r y S l m j u E P n M 8 f F H q Q a Q = = < / l a t e x i t > ↵ = 0.5 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 7 p u a d G 0 H s 7 o U 6 p a 3 z W H W v r R U X s = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 j E o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 top row of Figure 2 . A natural follow up question to these experiments is whether MMD can be made into a Nash equilibrium solver by either annealing the amount of regularization over time or by having the magnet trail behind the current iterate. We investigate this question in the bottom row of Figure 2 by comparing i) MMD with an annealed temperature, annealed stepsize, and constant magnet; ii) MMD with a constant temperature, constant stepsize, and moving magnet; iii) CFR (Zinkevich et al., 2007) ; and iv) CFR+ (Tammelin, 2014) . While CFR+ yields the strongest performance, suggesting that it remains the best choice for tabularly solving games, we view the results as very positive. Indeed, not only do both variants of MMD exhibit last-iterate convergent behavior, they also perform competitively with (or better than) CFR. This is the first instance of a standard RL algorithm yielding results competitive with tabular CFR in classical 2p0s benchmark games. For further details, see Section H.3 for the annealing temperature experiments and Section H.5 for the moving magnet experiments. M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f b c e r 9 a 8 1 x v D r x K / I L U U I F m v / o V D F K S J V Q Y w k H r n u 9 J E + a g D C O c T i t B p q k E M o Y h 7 V k q I K E 6 z O c 3 T / G Z V Q Y 4 T p U t Y f B c / T 2 R Q 6 L 1 J I l s Z w J m p J e 9 m f i f 1 8 t M f B 3 m T M j M U E E W i + K M Y 5 P i W Q B 4 w B Q l h k 8 s A a K Y v R W T E S g g x s Z U s S H 4 y y + v k v a F 6 1 + 6 9 Y f L W u O 2 i K O M T t A p O k c + u k I N d I + a q I U I k u g Z v a I 3 J 3 N e n H f n Y 9 F a c o q Z Y / Q H z u c P G Q a Q b A = = < / l a t e x i t >

Deep Multi-Agent Reinforcement Learning

The last experiments in the main body examine MMD as a deep multi-agent RL algorithm using self play. We benchmarked against OpenSpiel's ( Lanctot et al., 2019) implementation of NFSP (Heinrich & Silver, 2016 ) and RLlib's (Liang et al., 2018) implementation of PPO (Schulman et al., 2017) . We implemented MMD as a modification of RLlib's (Liang et al., 2018) PPO implementation by changing the adapative forward KL regularization to a reverse KL regularization. For hyperparameters, we tuned tpα t , η t qu for MMD; otherwise, we used default hyperparameters for each algorithm. As the games are too large to easily compute exact exploitability, we approximate exploitability using a DQN best response, trained for 10 million time steps. The results are shown in the top row of Figure 3 . The results include checkpoints after both 1 million and 10 million time steps, as well as bots that select the first legal action (Arbitrary) and that select actions uniformly at random (Random). As expected, both NFSP and MMD yield lower approximate exploitability after 10M steps than they do after 1M steps; on the other hand, PPO does not reliably reduce approximate exploitability over time. In terms of raw value, we find that MMD substantially outperforms the baselines in terms of approximate exploitabilty. We also show results of head-to-head match-ups in the bottom row of Figure 3 for the 10M time step checkpoints. As may be expected given the approximate exploitability results, we find that MMD outperforms our baselines in head-to-head matchups. For further details, see Section J.

Subject Matter of Additional Experiments

In the appendix, we include 14 additional experiments: 1. 

5. RELATED WORK

We discuss the main related work below. Additional related work concerning average policy deep reinforcement learning for 2p0s games can be found in Section K. Convex Optimization and Variational Inequalities Like MMD and Algorithm 3.1, the extragradient method (Korpelevich, 1976; Gorbunov et al., 2022) and the optimistic method (Popov, 1980) have also been studied in the context of zero-sum games and variational inequalities more generally. However, in contrast to MMD, these methods require smoothness to guarantee convergence. Outside the context of variational inequalities, analogues of MMD and Algorithm 3.1 have been studied in convex optimization under the non-Euclidean proximal gradient method (Beck, 2017) originally proposed by Tseng (2010) . But, in contrast to Theorem 3.4, existing convex optimization results (Beck, 2017; Tseng, 2010; Hanzely et al., 2021; Bauschke et al., 2017 ) are without linear rates because they do not assume the proximal regularization to be relatively-strongly convex. In addition to convex optimization, the non-Euclidean proximal gradient algorithm has also been studied in online optimization under the name composite mirror descent (Duchi et al., 2010) . Duchi et al. (2010) show a Op ? tq regret bound without strong convexity assumptions on the proximal term. In the case where the proximal term is relatively strongly convex, Duchi et al. (2010) give an improved rate of Oplog tq-implying that MMD has average iterate convergence with a rate of Op log t {tq for bounded problems, like QRE solving. Quantal Response Equilibria Among QRE solvers for NFGs, the PU and OMWPU algorithms from Cen et al. (2021) , which also possess linear convergence rates for NFGs, are most similar to MMD. However, both PU and OMWPU require two steps per iteration (because of their similarities to mirror-prox (Nemirovski, 2004) and optimistic mirror descent (Rakhlin & Sridharan, 2013)) , and PU requires an extra gradient evaluation. In contrast, our algorithm needs only one simple step per iteration (with the same computation cost as mirror descent) and our analysis applies to various choices of mirror map, meaning our algorithm can be used to compute a larger class of regularized equilibria, rather than only QREs. Among QRE solvers for EFGs, existing algorithms differ from MMD in that they either require second order information (Ling et al., 2018) or are first order methods with average iterate convergence (Farina et al., 2019; Ling et al., 2019) . In contrast to these methods, MMD attains linear last-iterate convergence. Single-Agent Reinforcement Learning Considered as a reinforcement learning algorithm, MMD with a negative entropy mirror map and a MaxEnt RL objective coincides with the NE-TRPO algorithm studied in (Shani et al., 2020) . MMD with a negative entropy mirror map is also similar to the MD-MPI algorithm proposed by Vieillard et al. (2020) but differs in that MD-MPI includes the negative KL divergence between the current and previous iterate within its Q-values, whereas MMD does not. Considered as a deep reinforcement learning algorithm, MMD with a negative entropy mirror map bears relationships to both KL-PPO (a variant of PPO that served as motivation for the more widely adopted gradient clipping variant) (Schulman et al., 2017) and MDPO (Tomar et al., 2020; Hsu et al., 2020) . In short, the negative entropy instantiation of MMD corresponds with KL-PPO with a flipped KL term and with MDPO when there is entropy regularization. We describe these relationships using symbolic expressions in Section L. Regularized Follow-the-Regularized-Leader Another line of work has combined follow-theregularized-leader with additional regularization, under the names friction follow-the-regularizedleader (F-FoReL) (Pérolat et al., 2021) and piKL (Jacob et al., 2022) , in an analogous fashion to how we combine mirror descent with additional regularization. Similarly to our work, F-FoReL was designed for the purpose of achieving last iterate convergence in 2p0s games. In terms of convergence guarantees, we prove discrete-time linear convergence for NFGs, while Pérolat et al. (2021) give continuous-time linear convergence for EFGs using counterfactual values; neither possesses the desired discrete-time result for EFGs using action values. In terms of ease-of-use, MMD offers the advantage that it is decentralizable, whereas the version of F-FoReL that Pérolat et al. (2021) present is not. In terms of scalability, MMD offers the advantage that it only requires approximating bounded quantities; in contrast, F-FoReL requires estimating an arbitrarily accumulating sum. Lastly, in terms of empirical performance, the tabular results presented in this work for MMD are substantially better than those presented for F-FoReL. For example, F-FoReL's best result in Leduc is an exploitability of about 0.08 after 200,000 iterations-it takes MMD fewer than 1,000 iterations to achieve the same value. On the other hand, piKL was motivated by improving the prediction accuracy of imitation learning via decision-time planning. We believe the success of piKL in this context suggests that MMD may also perform well in such a setting. While Jacob et al. (2022) 

6. CONCLUSION

In this work, we introduced MMD-an algorithm for reinforcement learning in single-agent settings and 2p0s games, and regularized equilibrium solving. We presented a proof that MMD converges exponentially fast to QREs in EFGs-the first algorithm of its kind to do so. We showed empirically that MMD exhibits desirable properties as a tabular equilibrium solver, as a single-agent deep RL algorithm, and as a multi-agent deep RL algorithm. This is the first instance of an algorithm exhibiting such strong performance across all of these settings simultaneously. We hope that, due to its simplicity, MMD will help open the door to 2p0s games research for RL researchers without game-theoretic backgrounds. We provide directions for future work in Section M. • h i P H i " Ť t pO i ˆAi q t ˆOi to denote information states (i.e., decision points).

Appendices

We use • T : S ˆA Ñ ∆pS Y tKuq to notate the transition function, where K notates termination, • R i : S ˆA Ñ R to notate a reward function, • O i : S ˆA Ñ O i to notate an observation function. • A i : H i Ñ A i to notate a legal action function. We are interested in 2p0s games, in which i P t1, 2u and @s, a, R 1 ps, aq " ´R2 ps, aq. For convenience, we use ´i to notate the player "not i". Single-agent settings are captured as a special case in which the second player has a trivial action set |A 2 | " 1. Normal-form games are captured as a special case in which there is only one state s and the transition function only supports termination: @a, supppT ps, aqq " tKu. (Here, we use supppX q to denote the support of a distribution X -i.e., the subset of the domain of X that is mapped to a value greater than zero: tx : X pxq ą 0u.) Each agent's goal is to maximize its expected return E « ÿ t R i pS t , A t q | π ff using its policy π i , which dictates a distribution over actions for each information state π i : H i Ñ ∆pA i q. In game theory literature, these policies are called behavioral form and assume perfect recall. We notate the expected value for an agent's action a i at an information state h i at time t under joint policy π as q π ph i , a i q " E « R i pS, A ´i, a i q `ÿ t 1 ąt R i pS t 1 , A t 1 q | π, h i , a i ff . Here, the first expectation samples the current Markov state S and the current opponent action A ´i from the posterior induced by player i reaching information state h i , when each player uses its part of joint policy π to determine its actions. The second expectation is over trajectories under the same conditions, with the additional condition that a i is the agent's action at the current time step. A.1 REDUCTION TO NORMAL FORM Given any game of the above form, we can reduce the game to normal form as follows. Let Πi denote the set of deterministic policies-i.e., the set of policies that support exactly one action at a time: Πi " tπ i : @h i |supppπ i ph i qq| " 1u. The action space of the normal-form game is the space of deterministic policies: Ãi " Πi . 7 The reward function of the normal-form game is dictated by the expected return of the deterministic joint policy. Remark A.1. Any policy π i can be expressed as a finite mixture over policies in Πi in a fashion that induces the same distribution over trajectories (against arbitrary, but fixed, opponents). Conversely, any finite mixture over policies in Πi can be expressed as a policy π i that induces the same distribution over trajectories (against arbitrary, but fixed, opponents). By the remark above, joint policies in the original game possess counterparts in the normal-form game (and vice versa) achieving identical expected returns. It is in the sense that the normal-form game is equivalent to the original game. A more detailed exposition on this equivalence can be found in Shoham & Leyton-Brown (2008) .

B SOLUTION CONCEPTS

Nash equilibria are perhaps the most commonly sought-after solution concept in 2p0s games. A joint policy π 1 , π 2 is a Nash equilibrium if neither player can improve its expected return by changing its policy (assuming the other player does not change its policy): @i, π i P arg max π 1 i E « ÿ t R i pS t , A t q | π 1 i , π ´iff . Note that, in single-agent settings, this corresponds with the notion of an optimal policy in reinforcement learning. Another solution concept is a logit quantal response equilibrium (McKelvey & Palfrey, 1995; 1998) . As we only deal with logit quantal response equilibria, we generally drop logit and refer to them simply as quantal reponse equilibria. In normal-form games, there are multiple equivalent ways to define a quantal response equilibrium. One way is using entropy regularization. We say a joint policy is an α-QRE in a normal-form game if each player maximizes a weighted combination of expected return and policy entropy @i, π i P arg max π 1 i E " R i pAq `αHpπ 1 i q | π 1 i , π ´i‰ . In a temporally-extended game, we say a joint policy is an α-QRE if the equivalent mixture over deterministic joint policies is an α-QRE of the equivalent normal-form game. An alternative way to extend QREs to temporally extended settings is to ask that they satisfy the normal-form QRE condition at each information state: @i, @h i , π i ph i q P arg max π 1 i phiq E A"π 1 i phiq " q π i ph i , Aq `αHpπ 1 i ph i qq ‰ . ( ) When a joint policy satisfies this condition, it is called an agent QRE (as it is as if there is a separate agent playing a part of a normal-form QRE at each information state). In single-agent settings, α-AQREs correspond with the fixed point of the instantiation of expected SARSA (Sutton & Barto, 2018) in which the policy is a softmax distribution over Q-values with temperature α. The last solution concept that we investigate is called the MiniMaxEnt equilibirum. A joint policy is an α-MiniMaxEnt equilibrium if it satisfies condition (13) for MiniMaxEnt Q-values q π ph i , a i q " E « R i pS, A ´i, a i q ´αHpπpH t ´iqq `ÿ t 1 ąt R i pS t 1 , A t 1 q `αHpπpH t 1 i qq ´αHpπpH t 1 ´iqq | π, h i , a i ff . Alternatively, α-MiniMaxEnt equilibria can be defined as the saddlepoint of the α-MiniMaxEnt objective max π1 min π2 E « ÿ t R 1 pH t , A t q `αHpπ 1 pH t 1 qq ´αHpπ 2 pH t 2 qq ff . While the name MiniMaxEnt is novel to this work, the concept has been studied in recent existing work (Pérolat et al., 2021; Cen et al., 2021) .

C REDUCED NORMAL-FORM LOGIT-QRES AND MMD C.1 SEQUENCE-FORM BACKGROUND

A Nash-equilibrium in a 2p0s extensive-form game can be formulated as a bilinear saddle point problem over the sequence form (Nisan et al., 2007 ) min xPX max yPY x J Ay, where X and Y are the sequence form polytopes, which equivalently can be viewed as treeplexes (Hoda et al., 2010; Kroer et al., 2020) . We provide some background on the sequence form in the context of the min player (player 1); the max player follows similarly. Recall all the decision points for player 1 are denoted as H 1 (also known as information states) and the actions available at decision point h P H 1 are A 1 phq. Recall that a policy (a.k.a behavioral-form strategy) is denoted as π 1 with π 1 phq P ∆pA 1 phqq being the policy at decision point h. For convenience, let π 1 ph, aq denote the probability of taking action a P A 1 phq at decision point h. Next we denote pphq as the parent sequence to reach decision point h-that is, the unique previous decision point and action taken by the player before reaching h. Note that this parent is unique due to perfect recall and that it is possible for many decision points to share the same parent. Then we can construct the sequence form from the top down, where the ph, aq sequence of x P X is given by x ph,aq " x pphq πph, aq. For convenience, the root sequence ∅ is defined to be the parent of all initial decision points of the game and is set to the constant x ∅ " 1. We denote x h as the slice of x " px ph,aq q hPH1,aPAphq corresponding to decision point h. Note we have the following relationship πphq " x h {x pphq . Because x ph,aq corresponds to the probability of player 1 choosing all actions along the sequence until reaching ph, aq, we get that x J Ay is the expected payoff for player 2 given a pair of sequence-form strategies x, y. Thus the sequence form allows us to get a bilinear objective. Given the bilinear structure of the sequence-form problem, we convert the problem into a VI using first-order optimality conditions. In order to apply MMD (or other first-order methods), we need a good choice of a mirror map for X and Y. A such choice is the class of dilated distance generating functions (Hoda et al., 2010) : ψpxq " ÿ hPH1 β h x pphq ψ h ˆxh x pphq ˙(14) " ÿ hPH1 β h x pphq ψ h pπphqq, where pβ h q hPH1 ą 0 are per-decision-point weights and ψ h is a distance-generating function for the simplex associated to h. If ψ h is taken to be the negative entropy then we say ψ is the dilated entropy function. In the normal-form setting the dilated entropy is simply the standard negative entropy. Recently it was shown that an α-QRE (for the reduced normal form) is the solution to the following saddle point problem over the sequence form (Ling et al., 2018) , min xPX max yPY αψ 1 pxq `xJ Ay ´αψ 2 pyq, where ψ 1 , ψ 2 are dilated entropy functions with weights β h " 1. Note that we have the normal form α-QRE as a special case of equation 16.

C.2 PROOF FOR PROPOSITION 2.4

Proposition 2.4. Solving a normal-form reduced QRE in a two-player zero-sum EFG is equivalent to solving VIpZ, F `α∇ψq where Z is the cross-product of the sequence form strategy spaces and ψ is the sum of the dilated entropy functions for each player. The function ψ is strongly convex with respect to } ¨}. Furthermore, F is monotone and max ij |A ij |-smooth (A being the sequence-form payoff matrix) with respect to } ¨} and F `α∇ψ is strongly monotone. Proof. The problem of finding a reduced normal-form logit QRE is equivalent to solving the saddlepoint problem stated in equation 16 (Ling et al., 2018) . Therefore, due to the convexity of ψ 1 and ψ 2 and the discussion from Section 2.3, we have that the solution to equation 16 is equivalent to the solution of VIpZ, F `∇ψq where, F pzq " " Ay ´AJ x ȷ , ∇ψpzq " " ∇ x ψ 1 pxq ∇ y ψ 2 pyq ȷ . ( ) From Hoda et al. (2010) we know there exists constants µ 1 , µ 2 such that ψ 1 is µ 1 -strongly convex over X with respect to } ¨}1 and ψ 2 is µ 2 -strongly convex over Y with respect to } ¨}1 (Hoda et al. (2010) do not show bounds on these constants, but we only need them to exist). Therefore, ψ is also strongly convex over Z with constant mintµ 1 , µ 2 u with respect to }z} " a }x} 2 1 `}y} 2 1 since, for z " px, yq, and z 1 " px 1 , y 1 q we have x∇ψpzq ´∇ψpz 1 q, z ´z1 y " x∇ψ 1 pxq ´∇ψ 1 px 1 q, x ´x1 y `x∇ψ 2 pyq ´∇ψ 2 py 1 q, y ´y1 y ě µ 1 }x ´x1 } 2 1 `µ2 }y ´y1 } 2 1 ě mintµ 1 , µ 2 u `}x ´x1 } 2 1 `}y ´y1 } 2 1 " mintµ 1 , µ 2 u}z ´z1 } 2 . Following Theorem 3.4, it is useful to characterize the smoothness of F under the same norm for which ψ is strongly-convex. First, notice that for any matrix A we have that }Ax ´Ay} 8 ď max ij |A ij |}x ´y} 1 (see for example Bubeck et al. (2015) [Section 5.2.4]). Therefore altogether we have }F pzq ´F pz 1 q} 2 ˚" }Ay ´Ay 1 } 2 8 `}A J x ´AJ x 1 } 2 8 ď max ij |A ij | 2 `}y ´y1 } 2 1 `}x ´x1 } 2 1 " max ij |A ij | 2 }z ´z1 } 2 , showing that F is L " max ij |A ij |-smooth with respect to } ¨}. The strong-monotonicity of F `∇ψ follows since F is monotone and ∇ψ is strongly monotone since ψ is strongly convex. Note that in general the Hessian can have unbounded entries (Kroer et al., 2020) meaning that ∇ψ cannot be L-smooth (Beck, 2017) . Our MMD algorithm which handles ψ in closed form allows us to sidestep this issue. We also have that the dual norm of } ¨} is simply }z} ˚" a }x} 2 8 `}y} 2 8 (Bubeck et al., 2015; Nemirovski, 2004) .

C.3 MMD FOR FINDING REDUCED NORMAL-FORM QRES OVER THE SEQUENCE-FORM

From Proposition 2.4 and Corollary D.6 we have that the MMD descent-ascent updates (8-9) with ψ 1 ,ψ 2 , taken to be dilated entropy with η ď α {maxij |Aij | 2 converges linearly to the solution of equation 16. The updates, as mentioned by Remark 3.7, can be computed in closed-form as a one-line change to mirror descent with dilated-entropy (Kroer et al., 2020) . Indeed, setting g t " Ay t (the gradient for the min player), we have that the update for the min player can be written as follows x x pphq ˆx ηg h t ´∇ψpx t q h p1 `ηαq , πphqy `ψh pπphqq ˙. Updates can be computed in closed-form starting from decision points h without any children and progressing upwards in the game tree. The following result is also known as the non-Euclidean prox theorem (Beck, 2017) [Theorem 9.12] or the three-point property (Tseng, 2008) . Proposition D.2. Assume Z closed convex and both f and ψ are differentiable at z (defined below). Then the following statements are equivalent 1. z " arg min zPZ ηxg, zy `f pzq `Bψ pz; yq 2. @z P Z xηg `∇f pzq, z ´zy ď B ψ pz; yq ´Bψ pz; zq ´Bψ pz, yq Proof. z " arg min zPX ηxg, zy `f pzq `Bψ pz; yq ô x∇ψpzq `ηg `∇f pzq ´∇ψpyq, z ´zy ě 0 @z P Z ô x∇ψpyq ´∇ψpzq ´ηg ´∇f pzq, z ´zy ď 0 @z P Z ô xηg `∇f pzq, z ´zy ď x∇ψpzq ´∇ψpyq, z ´zy @z P Z ô xηg `∇f pzq, z ´zy ď B ψ pz; yq ´Bψ pz; zq ´Bψ pz; yq @z P Z. The first equivalence follows by the first-order optimality condition and the last one by Proposition D.1. Lemma D.3. One step of Algorithm 3.1, under the assumptions of Theorem 3.4, guarantees that, for all z P Z, B ψ pz; z t`1 q ď (18) B ψ pz; z t q ´Bψ pz t`1 ; z t q `xηF pz t q `ηα∇gpz t`1 q, z ´zt`1 y. (  B ψ pz ˚; z t`1 q ď ˆ1 1 `ηα ˙t B ψ pz ˚; z 1 q. Proof. B ψ pz ˚; z t`1 q ď B ψ pz ˚; z t q ´Bψ pz t`1 ; z t q `xηF pz t q `ηα∇gpz t`1 q, z ˚´z t`1 y " B ψ pz ˚; z t q ´Bψ pz t`1 ; z t q `xηF pz t q ´ηF pz t`1 q, z ˚´z t`1 y `xηF pz t`1 q `ηα∇gpz t`1 q, z ˚´z t`1 y ď B ψ pz ˚; z t q ´Bψ pz t`1 ; z t q `xηF pz t q ´ηF pz t`1 q, z ˚´z t`1 y ´ηα pB ψ pz t`1 ; z ˚q `Bψ pz ˚; z t`1 qq ď B ψ pz ˚; z t q ´Bψ pz t`1 ; z t q `ηL}z t ´zt`1 }}z ˚´z t`1 } ´ηα pB ψ pz t`1 ; z ˚q `Bψ pz ˚; z t`1 qq ď B ψ pz ˚; z t q ´Bψ pz t`1 ; z t q `1 2 }z t ´zt`1 } 2 `η2 L 2 2 }z ˚´z t`1 } 2 ´ηα pB ψ pz t`1 ; z ˚q `Bψ pz ˚; z t`1 qq ď B ψ pz ˚; z t q `η2 L 2 B ψ pz t`1 ; z ˚q ´ηα pB ψ pz t`1 ; z ˚q `Bψ pz ˚; z t`1 qq η 2 L 2 ďηα ď B ψ pz ˚; z t q ´ηαB ψ pz ˚; z t`1 q. The first inequality follows from Lemma D.3 and the second inequality from Lemma D.4; the third inequality by the generalized Cauchy-Schwarz inequality and the smoothness of F ; the fourth inequality by elementary inequality ab ď ρa 2 2 `b2 2ρ @ρ ą 0; and the fifth inequality by the strong convexity of ψ since 1 2 }x ´y} 2 ď B ψ px; yq. Therefore altogether we have B ψ pz ˚; z t`1 q ď B ψ pz ˚; z t q 1 `ηα . Iterating the inequality yields the result. Corollary D.6. Under the same assumptions as Theorem 3.4, if g is µ-strongly convex relative to ψ and ψ is µ ψ strongly convex, then if η ď αµ L 2 , Algorithm 3.1 guarantees B ψ pz ˚; z t`1 q ď ˆ1 1 `ηµα ˙t B ψ pz ˚; z 1 q. Proof. Observe that ψ " ψ µ ψ is 1-strongly convex and ḡ " g µµ ψ is 1-strongly convex relative to ψ, x∇ḡpxq ´∇ḡpyq, x ´yy " 1 µµ ψ x∇gpxq ´∇gpyq, x ´yy (20) ě 1 µ ψ x∇ψpxq ´∇ψpyq, x ´yy " x∇ ψpxq ´∇ ψpyq, x ´yy. Rewriting the update of Algorithm 3.1 in terms of ḡ and ψ gives arg min zPZ η pxF pz t q, zy `αgpzqq `Bψ pz; z t q ô arg min zPZ η ˆxF pz t q, zy `αµµ ψ µµ ψ gpzq ˙`µ ψ µ ψ B ψ pz; z t q ô arg min zPZ η µ ψ ˆxF pz t q, zy `αµµ ψ µµ ψ gpzq ˙`1 µ ψ B ψ pz; z t q ô arg min zPZ η pxF pz t q, zy `αµµ ψ ḡpzqq `B ψ pz; z t q ô arg min zPZ η pxF pz t q, zy `ᾱḡpzqq `B ψ pz; z t q. The result follows from Theorem 3.4 with stepsize η " η µ ψ and ᾱ " µµ ψ α.

D.3 EQUIVALENCE BETWEEN MMD AND MD

In this section we show that MMD is equivalent to mirror descent (MD) with a different stepsize when an extra regularized loss is added. In the game context this implies that MMD can be implemented as mirror descent ascent on the regularized game with a particular stepsize. Proposition D.7. Magnetic mirror descent updates (6, 7) are equivalent to the following updates respectively (where the second update uses a magnet z 1 ): z t`1 " arg min zPZ ηxF pz t q `α∇ψpz t q, zy `Bψ pz; z t q, (23) z t`1 " arg min zPZ ηxF pz t q `α∇ zt B ψ pz t ; z 1 q, zy `Bψ pz; z t q, with stepsize η " η 1`ηα . Proof. We begin by proving the first equivalence, between equation ( 6) and ( 23): z t`1 " arg min zPZ η pxF pz t q, zy `αψpzqq `Bψ pz; z t q ôz t`1 " arg min zPZ xηF pz t q ´∇ψpz t q, zy `p1 `ηαqψpzq ôz t`1 " arg min zPZ x ηF pz t q ´∇ψpz t q 1 `ηα , zy `ψpzq ôz t`1 " arg min zPZ x ηF pz t q `ηα∇ψpz t q 1 `ηα ´∇ψpz t q, zy `ψpzq ôz t`1 " arg min zPZ xη pF pz t q `α∇ψpz t qq ´∇ψpz t q, zy `ψpzq ôz t`1 " arg min zPZ ηxF pz t q `α∇ψpz t q, zy `Bψ pz; z t q. The second equivalence follows from similar steps: z t`1 " arg min zPZ η `xF pz t q, zy `αB ψ pz; z 1 q ˘`B ψ pz; z t q ôz t`1 " arg min zPZ xηF pz t q ´ηα∇ψpz 1 q ´∇ψpz t q, zy `p1 `ηαqψpzq ôz t`1 " arg min zPZ x ηF pz t q ´ηα∇ψpz 1 q ´∇ψpz t q 1 `ηα , zy `ψpzq ôz t`1 " arg min zPZ x ηF pz t q ´ηα∇ψpz 1 q `ηα∇ψpz t q 1 `ηα ´∇ψpz t q, zy `ψpzq ôz t`1 " arg min zPZ xη `F pz t q `α∇ψpz t q ´α∇ψpz 1 q ˘´∇ψpz t q, zy `ψpzq ôz t`1 " arg min zPZ xη `F pz t q `α∇ zt B ψ pz t ; z 1 q ˘´∇ψpz t q, zy `ψpzq ôz t`1 " arg min zPZ ηxF pz t q `α∇ zt B ψ pz t ; z 1 q, zy `Bψ pz; z t q..

D.4 NEGATIVE ENTROPY MMD EXAMPLE

10 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 An example showing the simplex trajectories of MMD applied to a small NFG is shown in Figure 4 . V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " x v c H 9 C 5 T u L d x C 9 c D i e y 4 5 U + t o y c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r C X o M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S r l 6 X y n V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R B W O 4 Q = = < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 F G M + A / R p h Y M 3 U E + M a G V V G t O U W U = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r k X g M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S v n q v l K q 3 W R x 5 O E E T u E c P K h C D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R x + O 4 w = = < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " S X D K 0 H W q c w Z U 3 h P 7 H y D u D V + f R R M = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r E f U W 9 O I x g n l A s o b Z S S c Z M j u 7 z M w K Y c l H e P G g i F e / x 5 t / 4 y T Z g y Y W N B R V 3 X R 3 B b H g 2 r j u t 5 N b W V 1 b 3 8 h v F r a 2 d 3 b 3 i v s H D R 0 l i m G d R S J S r Y B q F F x i 3 X A j s B U r p G E g s B m M b q d + 8 w m V 5 p F 8 M O M Y / Z A O J O 9 z R o 2 V m p 7 7 m J 5 d T 7 r F k l t 2 Z y D L x M t I C T L U u s W v T i 9 i S Y j S M E G 1 b n t u b P y U K s O Z w E m h k 2 i M K R v R A b Y t l T R E 7 a e z c y f k x C o 9 0 o + U L W n I T P 0 9 k d J Q 6 3 E Y 2 M 6 Q m q F e 9 K b i f 1 4 7 M f 0 r P + U y T g x K N l / U T w Q x E Z n + T n p c I T N i b A l l i t t b C R t S R Z m x C R V s C N 7 i y 8 u k c V 7 2 K u W L + 0 q p e p P F k Y c j O I Z T 8 O A S q n A H N a g D g x E 8 w y u 8 O b H z 4 r w 7 H / P W n J P N H M I f O J 8 / S i m O 5 Q = = < / l a t e x i t > 10 11 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 8 O O N L r m b y 1 X t w 0 5 r 1 2 r 5 o 7 B R X 4 = " > A A A B 7 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B i 2 U j F T 0 W v X i s Y D + g X U s 2 z b a h 2 e y a Z I W y 7 J / w 4 k E R r / 4 d b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z / F h w b V z 3 2 y m s r K 6 t b x Q 3 S 1 v b O 7 t 7 5 f 2 D l o 4 S R V m T R i J S H Z 9 o J r h k T c O N Y J 1 Y M R L 6 g r X 9 8 c 3 U b z 8 x p X k k 7 8 0 k Z l 5 I h p I H n B J j p Q 5 2 H 9 I z j L N + u e J W 3 R n Q M s E 5 q U C O R r / 8 1 R t E N A m Z N F Q Q r b v Y j Y 2 X E m U 4 F S w r 9 R L N Y k L H Z M i 6 l k o S M u 2 l s 3 s z d G K V A Q o i Z U s a N F N / T 6 Q k 1 H o S + r Y z J G a k F 7 2 p + J / X T U x w 5 a V c x o l h k s 4 X B Y l A J k L T 5 9 G A K 0 a N m F h C q O L 2 V k R H R B F q b E Q l G w J e f H m Z t M 6 r u F a 9 u K t V 6 t d 5 H E U 4 g m M 4 B Q y X U I d b a E A T K A h 4 h l d 4 c x 6 d F + f d + Z i 3 F p x 8 5 h D + w P n 8 A a 7 + j x g = < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " u F 1 w O A 9 X k Y j H + 3 Q K J P y M 3 L 5 S r D o = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Y 1 o s e g F 4 8 R z A O S N c x O Z p M h s 7 P r T K 8 Q l v y E F w + K e P V 3 v P k 3 T h 4 H T S x o K K q 6 6 e 4 K E i k M u u 6 3 s 7 S 8 s r q 2 n t v I b 2 5 t 7 + w W 9 v b r J k 4 1 4 z U W y 1 g 3 A 2 q 4 F I r X U K D k z U R z G g W S N 4 L B z d h v P H F t R K z u c Z h w P 6 I 9 J U L B K F q p 6 b k P 2 a l 3 P u o U i m 7 J n Y A s E m 9 G i j B D t V P 4 a n d j l k Z c I Z P U m J b n J u h n V K N g k o / y 7 d T w h L I B 7 f G W p Y p G 3 P j Z 5 N 4 R O b Z K l 4 S x t q W Q T N T f E x m N j B l G g e 2 M K P b N v D c W / / N a K Y Z X f i Z U k i J X b L o o T C X B m I y f J 1 2 h O U M 5 t I Q y L e y t h P W p p g x t R H k b g j f / 8 i K p n 5 W 8 c u n i r l y s X M / i y M E h H M E J e H A J F b i F K t S A g Y R n e I U 3 5 9 F 5 c d 6 d j 2 n r k j O b O Y A / c D 5 / A L I I j x o = < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " L Z N E U c K e 2 4 s A U o X N F O A 0 8 e J 9 j 8 g = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Y l Q Y 9 B L x 4 j m A c k a 5 i d d J I h s 7 P r z K w Q l v y E F w + K e P V 3 v P k 3 T p I 9 a G J B Q 1 H V T X d X E A u u j e t + O y u r a + s b m 7 m t / P b O 7 t 5 + 4 e C w o a N E M a y z S E S q F V C N g k u s G 2 4 E t m K F N A w E N o P R z d R v P q H S P J L 3 Z h y j H 9 K B 5 H 3 O q L F S y 3 M f 0 n O v M u k W i m 7 J n Y E s E y 8 j R c h Q 6 x a + O r 2 I J S F K w w T V u u 2 5 s f F T q g x n A i f 5 T q I x p m x E B 9 i 2 V N I Q t Z / O 7 p 2 Q U 6 v 0 S D 9 S t q Q h M / X 3 R E p D r c d h Y D t D a o Z 6 0 Z u K / 3 n t x P S v / J T L O D E o 2 X x R P x H E R G T 6 P O l x h c y I s S W U K W 5 v J W x I F W X G R p S 3 I X i L L y + T x k X J K 5 c q d + V i 9 T q L I w f H c A J n 4 M E l V O E W a l A H B g K e 4 R X e n E f n x X l 3 P u a t K 0 4 2 c w R / 4 H z + A L U S j x w = < /

D.5 EUCLIDEAN MMD EXAMPLE

We discuss the Euclidean case for update equation 6 (update 7 is similar). In the Eucldiean case were ψ " 1 2 } ¨}2 2 we have that update equation 6 reduces to z t`1 " Π Z ˆzt ´ηF pz t q 1 `ηα ˙, where Π Z denotes the Euclidean projection onto Z. In the context of solving min max problems where ψ " ψ 1 `ψ2 , the sum of 1 2 } ¨}2 2 then the descent-ascent updates of (8,9) become x t`1 " Π X ˆxt ´η∇ xt f px t , y t q 1 `ηα ˙, y t`1 " Π Y ˆyt `η∇ yt f px t , y t q 1 `ηα ˙. Note that our results don't require bounded constraints-in the unconstrained setting there would be no projection step. By Theorem 3.4 the above iterations converge linearly to the solution of min xPX max yPY α 2 }x} 2 2 `f px, yq ´α 2 }y} 2 2 , provided f is smooth in the sense that F px, yq " r∇ x f px, yq, ´∇y f px, yqs J is smooth. For example, in the 1-D case we have that the following unconstrained saddle point problem can be solved with Euclidean MMD: min xPR max yPR α 2 x 2 `px ´aqpy ´bq ´α 2 y 2 . Where a, b are constants. In this case F px, yq " ry ´b, ´px ´aqs J , which is 1-smooth. Therefore, with stepsize η " α the following update rule converges linearly to the solution: x t`1 " x t ´αpy t ´bq 1 `α2 , y t`1 " y t `αpx t ´aq 1 `α2 . See Figure 5 for a visualization with a " b " 1. Proposition D.8. Suppose the assumptions of Theorem 3.4 hold. Moreover, assume that g is twice continuously differentiable over int dom ψ and Z is bounded. In that case, there exists a constant C and a time step t 1 such that, for any t ě t 1 , θ gap pz t q " sup zPZ xF pz t q `α∇gpz t q, z t ´zy ď C ˆc 1 1 `ηα ˙t´t 1 b B ψ pz ˚; z t 1 q. Proof. By Theorem 3.4 we know z t Ñ z ˚where tz t u tě1 Y tz ˚u Ď int dom ψ. Therefore, we have that tz t u tě1 Y tz ˚u is eventually within a closed ball centered at z ˚. That is, there exists t 1 and a closed ball B such that tz t u tět 1 Y tz ˚u Ď B Ď int dom ψ. Since B is compact and ∇ 2 g is continuous over B, we have that ∇ 2 gpzq is bounded on B. Therefore, there exists L B such that }∇gpz 1 q ´∇gpzq} ˚ď L B }z ´z1 } for any z, z 1 P B. Setting G " F `∇g, we have that, for any z, z 1 P B, }Gpzq ´Gpz 1 q} ˚ď L}z ´z1 } for L " L `LB . Then for any z P Z, t ě t 1 , denoting x ˚as the solution to VIpZ, Gq, we have xGpz t q, z t ´zy " xGpz ˚q, z t ´zy `xGpz t q ´Gpz ˚q, z t ´zy " xGpz ˚q, z ˚´zy loooooooomoooooooon ď0 `xGpz ˚q, z t ´z˚y `xGpz t q ´Gpz ˚q, z t ´zy ď }Gpz ˚q} ˚}z t ´z˚} `L}z t ´z˚} }z t ´z} ď ´}Gpz ˚q} ˚`L D ¯}z t ´z˚} ď C b B ψ pz ˚; z t q ď C ˆc 1 1 `ηα ˙t´t 1 b B ψ pz ˚; z t 1 q where D is such that max z,z 1 PZ }z ´z1 } ď D and C " }Gpz ˚q} ˚`L D. The first inequality is by the generalized Cauchy-Schwarz inequality and the Lipschitz property of G. The second inequality is by boundedness of Z. The third inequality is by the fact that B ψ pz ˚; z t q ě 1 2 }z ˚´z t } 2 . The fourth inequality is by applying Theorem 3.4 inductively. Note that we have the following well-known inequality between the saddle-point gap " αg 1 pxq `f px, y 1 q ´αg 2 py 1 q ´pαg 1 pxq `f px, yq ´αg 2 pyqq `pαg 1 pxq `f px, yq ´αg 2 pyqq ´`αg 1 px 1 q `f px 1 , yq ´αg 2 pyq ˘for some pair px 1 , y 1 q P X ˆY ď x´∇f y px, yq `α∇g 2 pyq, y ´y1 y `x∇f x px, yq `α∇g 1 pxq, x ´x1 y " xF pzq `α∇g, z ´z1 y for z " px, yq and z 1 " px 1 , y 1 q ď θ gap pzq. Therefore Proposition D.6 gives a guarantee on the saddle-point gap ξpx, yq.

E MMD FOR LOGIT-AQRES AND MINIMAXENT EQUILIBRIA

By Proposition D.2, the MMD update equation 6, restated below, has fixed points corresponding to the solutions of VIpZ, F `∇ψq: z t`1 " arg min zPZ η pxF pz t q, zy `αψpzqq `Bψ pz; z t q. If Z is the cross-product of policy spaces for both players (cross product of sets of behavioral-form policies) and ψ is the sum of negative entropy over all decision points (information states), and F includes the the negative q-values for both players, then the iteration above reduces to π t`1 ph i q9rπ t ph i qe ηqπ t phiq s 1{p1`ηαq with fixed points corresponding to @i, @h i , π i ph i q P arg max π 1 i phiq E A"π 1 i phiq " q π ph i , Aq `αHpπ 1 i ph i qq ‰ or, equivalently, the solution to VIpZ, F `∇ψq, which corresponds to a logit-AQRE. If F includes the negative MiniMaxEnt Q-values for both players, the fixed point instead corresponds to a MiniMaxEnt equilibrium.

F EXPERIMENTAL DOMAINS

For our experiments with normal-form games, we used No-Press Diplomacy stage games. No-Press Diplomacy is a seven-player Markov game in which players compete to conquer Europe. Because the game is a Markov game (which means that the game is fully observable but that the players move simultaneously), each turn of the game resembles a normal-form game. We constructed the normalform games that we used for our experiments by querying an open source value function (Bakhtin et al., 2021) in different circumstances for a two-player variant of the game, similarly to Zhang et al. (2022) . These games have payoff matrices of shape p50, 50q (game A), p35, 43q (game B), p50, 50q (game C), and p4, 4q (game D). We normalized the payoffs of each game to r0, 1s. For our extensive-form games, we used the implementations of Kuhn Poker, 2x2 (and also 3x3) Abrupt Dark Hex, 4-Sided Liar's Dice, and Leduc Poker provided by OpenSpiel (Lanctot et al., 2019) . Kuhn poker (Kuhn, 1951) is a simplified poker game with three cards (J, Q, K). It has 54 non-terminal histories (not counting chance nodes). Abrupt Dark Hex is a variant of the classical board game Hex (Bakst & Gardner, 1962) . In Hex, two players take turns placing stones onto a board. One player's goal is to create a path of its stones connecting the east end of the board with the west end, while the other player's goal is to do the same with the north end and south end. Dark Hex is a variant in which players cannot see where their opponents are placing stones. Abrupt Dark Hex is a variant of Dark Hex in which placing a stone in an occupied position results in a loss of turn. The prefix nxn describes the size of the board. 2x2 Abrupt Dark Hex has 471 non-terminal histories. 3x3 Abrupt Dark Hex has too many non-terminal histories to enumerate on our hardware. Liar's Dice (Ferguson & Ferguson, 1991 ) is a dice game in which players privately roll dice and place bids based on the observed outcomes, similarly to poker games. The prefix n-sided means that the players play with 4-sided dice. 4-Sided Liar's Dice has 8176 non-terminal histories (not counting chance nodes). Leduc Poker (Southey et al., 2005) is a small poker game with three card values (J, Q, K), each of which have two instances in the deck. It has 9300 non-terminal histories non-terminal histories (not counting chance nodes). For our single-agent deep RL experiments, we use three Atari games (Bellemare et al., 2013) and three Mujoco games (Todorov et al., 2012) . We selected these games because Huang et al. (2022) used them to benchmark an open source implementation of PPO.

G QRE EXPERIMENTS

G.1 FULL FEEDBACK QRE CONVERGENCE DIPLOMACY We perform various QRE experiments under full feedback for Diplomacy stage games. Full feedback means that each player outputs a fully specified policy and receives its exact Q-values (given both players' policies) as feedback. Both players then perform the update π t`1 ph i q9rπ t ph i qe ηqπ t phiq s 1{p1`ηαq . 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " x v c H 9 C 5 T u L d x C 9 c D i e y 4 5 U + t o y c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r C X o M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S r l 6 X y n V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R B W O 4 Q = = < / l a t e x i t > 10 7 < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 F G M + A / R p h Y M 3 U E + M a G V V G t O U W U = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r k X g M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S v n q v l K q 3 W R x 5 O E E T u E c P K h C D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R x + O 4 w = = < / l a t e x i t > 10 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S X D K 0 H W q c w Z U 3 h P 7 H y D u D V + f R R M = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r E f U W 9 O I x g n l A s o b Z S S c Z M j u 7 z M w K Y c l H e P G g i F e / x 5 t / 4 y T Z g y Y W N B R V 3 X R 3 B b H g 2 r j u t 5 N b W V 1 b 3 8 h v F r a 2 d 3 b 3 i v s H D R 0 l i m G d R S J S r Y B q F F x i 3 X A j s B U r p G E g s B m M b q d + 8 w m V 5 p F 8 M O M Y / Z A O J O 9 z R o 2 V m p 7 7 m J 5 d T 7 r F k l t 2 Z y D L x M t I C T L U u s W v T i 9 i S Y j S M E G 1 b n t u b P y U K s O Z w E m h k 2 i M K R v R A b Y t l T R E 7 a e z c y f k x C o 9 0 o + U L W n I T P 0 9 k d J Q 6 3 E Y 2 M 6 Q m q F e 9 K b i f 1 4 7 M f 0 r P + U y T g x K N l / U T w Q x E Z n + T n p c I T N i b A l l i t t b C R t S R Z m x C R V s C N 7 i y 8 u k c V 7 2 K u W L + 0 q p e p P F k Y c j O I Z T 8 O A S q n A H N a g D g x E 8 w y u 8 O b H z 4 r w 7 H / P W n J P N H M I f O J 8 / S i m O 5 Q = = < / l a t e x i t > 10 11 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 8 O O N L r m b y 1 X t w 0 5 r 1 2 r 5 o 7 B R X 4 = " > A A A B 7 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B i 2 U j F T 0 W v X i s Y D + g X U s 2 z b a h 2 e y a Z I W y 7 J / w 4 k E R r / 4 d b / 4 b 0 3 Y P 2 v p g 4 P H e D D P z / F h w b V z 3 2 y m s r K 6 t b x Q 3 S 1 v b O 7 t 7 5 f 2 D l o 4 S R V m T R i J S H Z 9 o J r h k T c O N Y J 1 Y M R L 6 g r X 9 8 c 3 U b z 8 x p X k k 7 8 0 k Z l 5 I h p I H n B J j p Q 5 2 H 9 I z j L N + u e J W 3 R n Q M s E 5 q U C O R r / 8 1 R t E N A m Z N F Q Q r b v Y j Y 2 X E m U 4 F S w r 9 R L N Y k L H Z M i 6 l k o S M u 2 l s 3 s z d G K V A Q o i Z U s a N F N / T 6 Q k 1 H o S + r Y z J G a k F 7 2 p + J / X T U x w 5 a V c x o l h k s 4 X B Y l A J k L T 5 9 G A K 0 a N m F h C q O L 2 V k R H R B F q b E Q l G w J e f H m Z t M 6 r u F a 9 u K t V 6 t d 5 H E U 4 g m M 4 B Q y X U I d b a E A T K A h 4 h l d 4 c x 6 d F + f d + Z i 3 F p x 8 5 h D + w P n 8 A a 7 + j x g = < / l a t e x i t > 10 13 < l a t e x i t s h a 1 _ b a s e 6 4 = " u F 1 w O A 9 X k Y j H + 3 Q K J P y M 3 L 5 S r D o = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Y 1 o s e g F 4 8 R z A O S N c x O Z p M h s 7 P r T K 8 Q l v y E F w + K e P V 3 v P k 3 T h 4 H T S x o K K q 6 6 e 4 K E i k M u u 6 3 s 7 S 8 s r q 2 n t v I b 2 5 t 7 + w W 9 v b r J k 4 1 4 z U W y 1 g 3 A 2 q 4 F I r X U K D k z U R z G g W S N 4 L B z d h v P H F t R K z u c Z h w P 6 I 9 J U L B K F q p 6 b k P 2 a l 3 P u o U i m 7 J n Y A s E m 9 G i j B D t V P 4 a n d j l k Z c I Z P U m J b n J u h n V K N g k o / y 7 d T w h L I B 7 f G W p Y p G 3 P j Z 5 N 4 R O b Z K l 4 S x t q W Q T N T f E x m N j B l G g e 2 M K P b N v D c W / / N a K Y Z X f i Z U k i J X b L o o T C X B m I y f J 1 2 h O U M 5 t I Q y L e y t h P W p p g x t R H k b g j f / 8 i K p n 5 W 8 c u n i r l y s X M / i y M E h H M E J e H A J F b i F K t S A g Y R n e I U 3 5 9 F 5 c d 6 d j 2 n r k j O b O Y A / c D 5 / A L I I j x o = < / l a t e x i t > 10 15 < l a t e x i t s h a 1 _ b a s e 6 4 = " L Z N E U c K e 2 4 s A U o X N F O A 0 8 e J 9 j 8 g = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Y l Q Y 9 B L x 4 j m A c k a 5 i d d J I h s 7 P r z K w Q l v y E F w + K e P V 3 v P k 3 T p I 9 a G J B Q 1 H V T X d X E A u u j e t + O y u r a + s b m 7 m t / P b O 7 t 5 + 4 e C w o a N E M a y z S E S q F V C N g k u s G 2 4 E t m K F N A w E N o P R z d R v P q H S P J L 3 Z h y j H 9 K B 5 H 3 O q L F S y 3 M f 0 n O v M u k W i m 7 J n Y E s E y 8 j R c h Q 6 x a + O r 2 I J S F K w w T V u u 2 5 s f F T q g x n A i f 5 T q I x p m x E B 9 i 2 V N I Q t Z / O 7 p 2 Q U 6 v 0 S D 9 S t q Q h M / X 3 R E p D r c d h Y D t D a o Z 6 0 Z u K / 3 n t x P S v / J T L O D E o 2 X x R P x H E R G T 6 P O l x h c y I s S W U K W 5 v J W x I F W X G R p S 3 I X i L L y + T x k X J K 5 c q d + V i 9 T q L I w f H c A J n 4 M E l V O E W a l A H B g K e 4 R X e n E f n x X l 3 P u a t K 0 4 2 c w R / 4 H z + A L U S j x w = < / l a t Iterations KL Divergence MMD PU OMWU Game A Game B Game C Game D ↵ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " h 5 G W B U 8 x d b I o u E j 0 X / 2 r S 5 y 5 0 h A = " > A A A B 9 H i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V i 1 6 E o h e P F e w H t E u Z T b N t a D a 7 J t l C K f 0 d X j w o 4 t U f 4 8 1 / Y 9 r u Q V s f D P N 4 b 4 Z M X p A I r o 3 r f j u 5 t f W N z a 3 8 d m F n d 2 / / o H h 4 1 N B x q i i r 0 1 j E q h W g Z o J L V j f c C N Z K F M M o E K w Z D O 9 m f n P E l O a x f D T j h P k R 9 i U P O U V j J b + D I h k g u S F u 2 a 1 0 i y X b 5 i C r x M t I C T L U u s W v T i + m a c S k o Q K 1 b n t u Y v w J K s O p Y N N C J 9 U s Q T r E P m t b K j F i 2 p / M j 5 6 S M 6 v 0 S B g r W 9 K Q u f p 7 Y 4 K R 1 u M o s J M R m o F e 9 m b i f 1 4 7 N e G 1 P + E y S Q 2 T d P F Q m A p i Y j J L g P S 4 Y t S I s S V I F b e 3 E j p A h d T Y n A o 2 B G / 5 y 6 u k c V H 2 L s u V h 8 t S 9 T a L I w 8 n c A r n 4 M E V V O E e a l A H C k / w D K / w 5 o y c F + f d + V i M 5 p x s 5 x j + w P n 8 A Y o t k K Y = < / l a t e x i t > ↵ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " F k J 7 u u 6 h z 6 z v j I p J e m K Y / 7 t G C q 0 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 i k o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f Z c v 1 + t e a 4 3 B 1 4 l f k F q q E C z X / 0 K B i n J E i o M 4 a B 1 z / e k C X N Q h h F O p 5 U g 0 1 Q C G c O Q 9 i w V k F A d 5 v O b p / j M K g M c p 8 q W M H i u / p 7 I I d F 6 k k S 2 M w E z 0 s v e T P z P 6 2 U m v g 5 z J m R m q C C L R X H G s U n x L A A 8 Y I o S w y e W A F H M 3 o r J C B Q Q Y 2 O q 2 B D 8 5 Z d X S f v C 9 e v u 5 U O 9 1 r g t 4 i i j E 3 S K z p G P r l A D 3 a M m a i G C J H p G r + j N y Z w X 5 9 3 5 W L S W n G L m G P 2 B 8 / k D E v a Q a A = = < / l a t e x i t > ↵ = 0.2 < l a t e x i t s h a 1 _ b a s e 6 4 = " X G s a T e p s G G l a u S t O K t u L Y c K w W v 8 = " > A A A B 8 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 h K R S 9 C 0 Y v H C r Y W m l A 2 2 0 2 7 d L N Z d j d C C f 0 b X j w o 4 t U / 4 8 1 / 4 7 b N Q V s f D D z e m 2 F m X i Q 5 0 8 b z v p 3 S 2 v r G 5 l Z 5 u 7 K z u 7 d / U D 0 8 6 u g 0 U 4 S 2 S c p T 1 Y 2 w p p w J 2 j b M c N q V i u I k 4 v Q x G t / O / M c n q j R L x Y O Z S B o m e C h Y z A g 2 V g o C z O U I o 2 v k u f V + t e a 5 3 h x o l f g F q U G B V r / 6 F Q x S k i V U G M K x 1 j 3 f k y b M s T K M c D q t B J m m E p M x H t K e p Q I n V I f 5 / O Y p O r P K A M W p s i U M m q u / J 3 K c a D 1 J I t u Z Y D P S y 9 5 M / M / r Z S a + C n M m Z G a o I I t F c c a R S d E s A D R g i h L D J 5 Z g o p i 9 F Z E R V p g Y G 1 P F h u A v v 7 x K O n X X b 7 g X 9 4 1 a 8 6 a I o w w n c A r n 4 M M l N O E O W t A G A h K e 4 R X e n M x 5 c d 6 d j 0 V r y S l m j u E P n M 8 f F H q Q a Q = = < / l a t e x i t > ↵ = 0.5 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 7 p u a d G 0 H s 7 o U 6 p a 3 z W H W v r R U X s = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 j E o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f b c e r 9 a 8 1 x v D r x K / I L U U I F m v / o V D F K S J V Q Y w k H r n u 9 J E + a g D C O c T i t B p q k E M o Y h 7 V k q I K E 6 z O c 3 T / G Z V Q Y 4 T p U t Y f B c / T 2 R Q 6 L 1 J I l s Z w J m p J e 9 m f i f 1 8 t M f B 3 m T M j M U E E W i + K M Y 5 P i W Q B 4 w B Q l h k 8 s A a K Y v R W T E S g g x s Z U s S H 4 y y + v k v a F 6 1 + 6 9 Y f L W u O 2 i K O M T t A p O k c + u k I N d I + a q I U I k u g Z v a I 3 J 3 N e n H f n Y 9 F a c o q Z Y / Q H z u c P G Q a Q b A = = < / l a t e x i t > For our experiments, we set η " α (for each α) for MMD, which is the maximal value that retains a linear convergence guarantee for normal-form games with a max payoff magnitude of one. For PU and OMWU (Cen et al., 2021) , we also used the maximal values that guarantee linear convergence. We solved for the QRE for each game using Ling et al.'s Newton's method approach. We show iterations on the x-axis and KLpsolution, iterateq on the y-axis. We count each query to the oracle as an iterate, meaning that OMWU uses two iterates for every update (contrasting MMD and PU, which only use one). The results of the experiment, found in Figure 6 , show that all three algorithms converge linearly with faster rates for larger values of alpha, as is guaranteed by theory. We find that, for our Diplomacy games, MMD converges faster than PU and OMWU. However, we found that all three algorithms also exhibited faster convergence with larger than theoretically allowed stepsizes.

G.2 BLACK BOX QRE CONVERGENCE DIPLOMACY

Our second set of experiments examine convergence to QREs for our Diplomacy stage games with black box feedback. In this context, black box feedback means that each player i outputs an action A i sampled from its current policy and that player i receives Rp¨, A i , A ´iq (but not A ´i) as feedback. One way to approach such a setting is to construct an unbiased estimate of the exact Q-values. Letting r be the observed reward qt pa i q " " r{π t pa i q if A i " a i 0 otherwise is such an estimate. To see that this is true, observe Erq t pa i q | π t s " E A´i"πt » -π t pa i q ¨Rp¨, a i , A ´iq π t pa i q `ÿ a 1 i ‰ai π t pa 1 i q ¨0fi fl " E A´i"πt " π t pa i q ¨Rp¨, a i , A ´iq π t pa i q ȷ " E A´i"πt Rp¨, a i , A ´iq " q t pa i q. In Figure 7 , we show results for each of MMD, PU and OMWU, with the exact Q-values q t replaced by the unbiased estimates qt . For each algorithm, the stepsize at iteration t was set to be equal to the maximal step size for which there exists an exponential convergence guarantee divided by 10 ? t. In other words, η t " η 10 ? t . Each line is an average over 30 runs. The bands depict estimates of 95% confidence intervals computed using bootstrapping. Although none of the algorithms possess existing black box convergence guarantees, we observe that they all exhibit convergent behavior empirically. In terms of convergence speed, we observe that MMD compares favorably to PU and OMWU for α P t0.05, 0.1, 0.2u; however, for α " 0.5, OMWU performed the best, with the exception of game D. It is likely that all algorithms could achieve better performance, as we did not perform much hyperparameter tuning.  ↵ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " h 5 G W B U 8 x d b I o u E j 0 X / 2 r S 5 y 5 0 h A = " > A A A B 9 H i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V i 1 6 E o h e P F e w H t E u Z T b N t a D a 7 J t l C K f 0 d X j w o 4 t U f 4 8 1 / Y 9 r u Q V s f D P N 4 b 4 Z M X p A I r o 3 r f j u 5 t f W N z a 3 8 d m F n d 2 / / o H h 4 1 N B x q i i r 0 1 j E q h W g Z o J L V j f c C N Z K F M M o E K w Z D O 9 m f n P E l O a x f D T j h P k R 9 i U P O U V j J b + D I h k g u S F u 2 a 1 0 i y X b 5 i C r x M t I C T L U u s W v T i + m a c S k o Q K 1 b n t u Y v w J K s O p Y N N C J 9 U s Q T r E P m t b K j F i 2 p / M j 5 6 S M 6 v 0 S B g r W 9 K Q u f p 7 Y 4 K R 1 u M o s J M R m o F e 9 m b i f 1 4 7 N e G 1 P + E y S Q 2 T d P F Q m A p i Y j J L g P S 4 Y t S I s S V I F b e 3 E j p A h d T Y n A o 2 B G / 5 y 6 u k c V H 2 L s u V h 8 t S 9 T a L I w 8 n c A r n 4 M E V V O E e a l A H C k / w D K / w 5 o y c F + f d + V i M 5 p x s 5 x j + w P n 8 A Y o t k K Y = < / l a t e x i t > ↵ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " F k J 7 u u 6 h z 6 z v j I p J e m K Y / 7 t G C q 0 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 i k o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f Z c v 1 + t e a 4 3 B 1 4 l f k F q q E C z X / 0 K B i n J E i o M 4 a B 1 z / e k C X N Q h h F O p 5 U g 0 1 Q C G c O Q 9 i w V k F A d 5 v O b p / j M K g M c p 8 q W M H i u / p 7 I I d F 6 k k S 2 M w E z 0 s v e T P z P 6 2 U m v g 5 z J m R m q C C L R X H G s U n x L A A 8 Y I o S w y e W A F H M 3 o r J C B Q Q Y 2 O q 2 B D 8 5 Z d X S f v C 9 e v u 5 U O 9 1 r g t 4 i i j E 3 S K z p G P r l A D 3 a M m a i G C J H p G r + j N y Z w X 5 9 3 5 W L S W n G L m G P 2 B 8 / k D E v a Q a A = = < / l a t e x i t > ↵ = 0.2 < l a t e x i t s h a 1 _ b a s e 6 4 = " X G s a T e p s G G l a u S t O K t u L Y c K w W v 8 = " > A A A B 8 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 h K R S 9 C 0 Y v H C r Y W m l A 2 2 0 2 7 d L N Z d j d C C f 0 b X j w o 4 t U / 4 8 1 / 4 7 b N Q V s f D D z e m 2 F m X i Q 5 0 8 b z v p 3 S 2 v r G 5 l Z 5 u 7 K z u 7 d / U D 0 8 6 u g 0 U 4 S 2 S c p T 1 Y 2 w p p w J 2 j b M c N q V i u I k 4 v Q x G t / O / M c n q j R L x Y O Z S B o m e C h Y z A g 2 V g o C z O U I o 2 v k u f V + t e a 5 3 h x o l f g F q U G B V r / 6 F Q x S k i V U G M K x 1 j 3 f k y b M s T K M c D q t B J m m E p M x H t K e p Q I n V I f 5 / O Y p O r P K A M W p s i U M m q u / J 3 K c a D 1 J I t u Z Y D P S y 9 5 M / M / r Z S a + C n M m Z G a o I I t F c c a R S d E s A D R g i h L D J 5 Z g o p i 9 F Z E R V p g Y G 1 P F h u A v v 7 x K O n X X b 7 g X 9 4 1 a 8 6 a I o w w n c A r n 4 M M l N O E O W t A G A h K e 4 R X e n M x 5 c d 6 d j 0 V r y S l m j u E P n M 8 f F H q Q a Q = = < / l a t e x i t > ↵ = 0.5 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 7 p u a d G 0 H s 7 o U 6 p a 3 z W H W v r R U X s = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 j E o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f b c e r 9 a 8 1 x v D r x K / I L U U I F m v / o V D F K S J V Q Y w k H r n u 9 J E + a g D C O c T i t B p q k E M o Y h 7 V k q I K E 6 z O c 3 T / G Z V Q Y 4 T p U t Y f B c / T 2 R Q 6 L 1 J I l s Z w J m p J e 9 m f i f 1 8 t M f B 3 m T M j M U E E W i + K M Y 5 P i W Q B 4 w B Q l h k 8 s A a K Y v R W T E S g g x s Z U s S H 4 y y + v k v a F 6 1 + 6 9 Y f L W u O 2 i K O M T t A p O k c + u k I N d I + a q I U I k u g Z v a I 3 J 3 N e n H f n Y 9 F a c o q Z Y / Q H z u c P G Q a Q b A = = < / l a t e x i t > KL Divergence MMD PU OMWU Game A Game B Game C Game D Iterations 10 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t > 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 We also investigate the performance of other methods for estimating Q-values for the black box setting. One such method uses an unbiased baseline to reduce variance (Schmid et al., 2019; Davis et al., 2020) . The premise of this approach is the idea that any quantity that is zero in expectation can be subtracted from an unbiased Q-value estimate without introducing bias. As a result, if the quantity is correlated with the estimator, subtracting it from the estimate can reduce variance "for free". We call this quantity a baseline. For our baseline, we used b t pa i q " i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g " qt pa i q{π t pa i q ´q t pa i q if a i " A i ´q t pa i q otherwise. By a similar argument as above, this quantity is zero in expectation Erb t pa i q | π t s " π t pa i q ¨pq t pa i q{π t pa i q ´q t pa i qq ´ÿ a 1 i ‰ai π t pa 1 i q ¨q t pa i q " qt pa i q ´πt pa i qq t pa i q ´p1 ´πt pa i qq ¨q t pa i q " qt pa i q ´q t pa i q " 0. Also, if q is close to q, our baseline will be correlated with q. Thus, it satisfies our desired criteria. For q, we used a running estimate of the reward observed after selecting action a i . Specifically, every time action a i was selected, we updated qt pa i q " p1 ´ηqq t pa i q `ηr. We perform several experiments for solving reduced normal-form logit QREs by using MMD over the sequence form with dilated entropy. We use the descent-ascent updates x t`1 " arg min xPX η px∇ xt f px t , y t q, xy `αψ 1 pxqq `Bψ1 px; x t q, y t`1 " arg max yPY η px∇ yt f px t , y t q, yy ´αψ 2 pyqq ´Bψ2 py; y t q. The method is full feedback since ∇ xt f px t , y t q " Ay t and ∇ yt f px t , y t q " A J x t , where A is the sequence form payoff matrix. Note in the normal form setting ´Ay t and A J x t are the Q-values for both players and the algorithm is the same as described in Section G.1. We set the stepsize to be η " α {pmaxij |Aij |q 2 , the largest possible allowed from Theorem 3.4. For more details on the sequence form algorithm, see Section C.3. For Kuhn Poker and 2x2 Abrubt Dark Hex, we used Gambit (McKelvey, Richard D., McLennan, Andrew M., and Turocy, Theodore L., 2016; Turocy, 2005) to compute the reduced normal-form QRE. We check the convergence of MMD by plotting the sum of Bregman divergences with respect to dilated entropy B ψ pz ˚; z t q " B ψ1 px ˚; x t q `Bψ1 py ˚; y t q, with respect to the solution z ˚" px ˚, y ˚q. As predicted by Theorem 3.4 we observer linear convergence with faster convergence for larger values of α. For 

G.4 FULL FEEDBACK AQRE CONVERGENCE EFGS

Next, we investigate whether MMD can be made to converge to AQREs in extensive-form games. For these experiments we applied MMD in behavioral form, as described in Section E. Specifically, we computed q t ph i q for each player i and each information state h i . Then, we applied the update rule π t`1 ph i q9rπ t ph i qe ηqπ t phiq s 1{p1`ηαq . for each player i and information state h i . For each setting, we used η " α 10 .

Mean KL Divergence

1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9  V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " x v c H 9 C 5 T u L d x C 9 c D i e y 4 5 U + t o y c = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r C X o M e v E Y w T w g i W F 2 0 p s M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 / F h w b V z 3 2 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j p o 4 S x b D B I h G p t k 8 1 C i 6 x Y b g R 2 I 4 V 0 t A X 2 P L H t z O / 9 Y R K 8 0 g + m E m M v Z A O J Q 8 4 o 8 Z K L c 9 9 T C + q 0 3 6 x 5 J b d O c g q 8 T J S g g z 1 f v G r O 4 h Y E q I 0 T F C t O 5 4 b m 1 5 K l e F M 4 L T Q T T T G l I 3 p E D u W S h q i 7 q X z c 6 f k z C o D E k T K l j R k r v 6 e S G m o 9 S T 0 b W d I z U g v e z P x P 6 + T m O C 6 l 3 I Z J w Y l W y w K E k F M R G a / k w F X y I y Y W E K Z 4 v Z W w k Z U U W Z s Q g U b g r f 8 8 i p p X p a 9 S L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X N P u F g w U 0 R H Q I Q I a z 9 L X k L v p r 0 = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 L w 6 6 Z X K b s W d g S w T L y d l y F H v l b 6 6 / Z i l E U r D B N W 6 4 7 m J 8 T O q D G c C J 8 V u q j G h b E Q H 2 L F U 0 g i 1 n 8 3 O n Z B T q / R J G C t b 0 p C Z + n s i o 5 H W 4 y i w n R E 1 Q 7 3 o T c X / v E 5 q w m s / 4 z J J D U o 2 X x S m g p i Y T H 8 n f a 6 Q G T G 2 h D L F 7 a 2 E D a m i z N i E i j Y E b / H l Z d K 8 q H j V y u V 9 t V E c s v j t g L v p f r Z V 4 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r 8 X E M e v E Y w T w g W c P s p J M M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l h w b V z 3 2 8 m t r K 6 t b + Q 3 C 1 v b O 7 t 7 x f 2 D h o 4 S x b D O I h G p V k A 1 C i 6 x b r g R 2 I o V 0 j A Q 2 A x G t 1 O / + Y R K 8 0 g + m H G M f k g H k v c 5 o 8 Z K T c 9 9 T M 8 u J 9 1 i y S 2 7 M 5 B l 4 m W k B B l q 3 e J X p x e x J E R p m K B a t z 0 3 N n 5 K l e F M 4 K T Q S T T G l I 3 o A N u W S h q i 9 t P Z u R N y Y p U e 6 U f K l j R k p v 6 e S G m o 9 T g M b G d I z V A v e l P x P 6 + d m P 6 1 n 3 I Z J w Y l m y / q J 4 K Y i E x / J z 2 u k B k x t o Q y x e 2 t h A 2 p o s z Y h A o 2 B G / x 5 W X S O C 9 7 l f L F f a V U v c n i y M M R H M M p e H A F V b i D G t S B w Q i e 4 R X e n N h 5 c d 6 d j 3 l r z s l m D u E P n M 8 f R Z q O 4 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A h A = " > A A A B 9 H i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V i 1 6 E o h e P F e w H t E u Z T b N t a D a 7 J t l C K f 0 d X j w o 4 t U f 4 8 1 / Y 9 r u Q V s f D P N 4 b 4 Z M X p A I r o 3 r f j u 5 t f W N z a 3 8 d m F n d 2 / / o H h 4 1 N B x q i i r 0 1 j E q h W g Z o J L V j f c C N Z K F M M o E K w Z D O 9 m f n P E l O a x f D T j h P k R 9 i U P O U V j J b + D I h k g u S F u 2 a 1 0 i y X b 5 i C r x M t I C T L U u s W v T i + m a c S k o Q K 1 b n t u Y v w J K s O p Y N N C J 9 U s Q T r E P m t b K j F i 2 p / M j 5 6 S M 6 v 0 S B g r W 9 K Q u f p 7 Y 4 K R 1 u M o s J M R m o F e 9 m b i f 1 4 7 N e G 1 P + E y S Q 2 T d P F Q m A p i Y j J L g P S 4 Y t S I s S V I F b e 3 E j p A h d T Y n A o 2 B G / 5 y 6 u k c V H 2 L s u V h 8 t S 9 T a L I w 8 n c A r n 4 M E V V O E e a l A H C k / w D K / w 5 o y c F + f d + V i M 5 p x s 5 x j + w P n 8 A Y o t k K Y = < / l a t e x i t > ↵ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " F k J 7 u u 6 h z 6 z v j I p J e m K Y / 7 t G C q 0 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 i k o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K We show the results in Figure 11 . We measure convergence against solutions computed using Gambit (McKelvey, Richard D., McLennan, Andrew M., and Turocy, Theodore L., 2016; Turocy, 2010) . U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f Z c v 1 + t e a 4 3 B 1 4 l f k F q q E C z X / 0 K B i n J E i o M 4 a B 1 z / e k C X N Q h h F O p 5 U g 0 1 Q C G c O Q 9 i w V k F A d 5 v O b p / j M K g M c p 8 q W M H i u / p 7 I I d F 6 k k S 2 M w E z 0 s v e T P z P 6 2 U m v g 5 z J m R m q C C L R X H G s U n x L A A 8 Y I o S w y e W A F H M 3 o r J C B Q Q Y 2 O q 2 B D 8 5 Z d X S f v C 9 e v u 5 U O 9 1 r g t 4 i i j E 3 S K z p G P r l A D 3 a M m a i G C J H p G r + j N y Z w X 5 9 3 5 W L S W n G L m G P 2 B 8 / k D E v a Q a A = = < / l a t e x i t > ↵ = 0.2 < l a t e x i t s h a 1 _ b a s e 6 4 = " X G s a T e p s G G l a u S t O K t u L Y c K w W v 8 = " > A A A B 8 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 h K R S 9 C 0 Y v H C r Y W m l A 2 2 0 2 7 d L N Z d j d C C f 0 b X j w o 4 t U / 4 8 1 / 4 7 b N Q V s f D D z e m 2 F m X i Q 5 0 8 b z v p 3 S 2 v r G 5 l Z 5 u 7 K z u 7 d / U D 0 8 6 u g 0 U 4 S 2 S c p T 1 Y 2 w p p w J 2 j b M c N q V i u I k 4 v Q x G t / O / M c n q j R L x Y O Z S B o m e C h Y z A g 2 V g o C z O U I o 2 v k u f V + t e a 5 3 h x o l f g F q U G B V r / 6 F Q x S k i V U G M K x 1 j 3 f k y b M s T K M c D q t B J m m E p M x H t K e p Q I n V I f 5 / O Y p O r P K A M W p s i U M m q u / J 3 K c a D 1 J I t u Z Y D P S y 9 5 M / M / r Z S a + C n M m Z G a o I I t F c c a R S d E s A D R g i h L D J 5 Z g o p i 9 F Z E R V p g Y G 1 P F h u A v v 7 x K O n X X b 7 g X 9 4 1 a 8 6 a I o w w n c A r n 4 M M l N O E O W t A G A h K e 4 R X e n M x 5 c d 6 d j 0 V r y S l m j u E P n M 8 f F H q Q a Q = = < / l a t e x i t > ↵ = 0.5 < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 7 p u a d G 0 H s 7 o U 6 p a 3 z W H W v r R U X s = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 j E o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7 z x R p V k q H s 1 E 0 j C B o W A x I 2 C s F A T A 5 Q j w D f b c e r 9 a 8 1 x v D r x K / I L U U I F m v / o V D F K S J V Q Y w k H r n u 9 J E + a g D C O c T i t B p q k E M o Y h 7 V k q I K E 6 z O c 3 T / G Z V Q Y 4 T p U t Y f B c / T 2 R Q 6 L 1 J I l s Z w J m p J e 9 m f i f 1 8 t M f B 3 m T M j M U E E W i + K M Y 5 P i W Q B 4 w B Q l h k 8 s A a K Y v R W T E S g g x s Z U s S H 4 y y + v k v a F 6 1 + 6 9 Y f L W u O 2 i K O M T t A p O k c + u k I N d I + a q I U I k u g Z v a I 3 J 3 N e n H f n Y 9 F a c o q Z Y / Q H z u c P G Q a Q b A = = < / l a t Published as a conference paper at ICLR 2023 Despite a lack of proven convergence guarantees, we observe that MMD converges to the AQRE in each game, for each temperature. While the convergence is not monotonic, it is roughly linear over large time scales.

H EXPLOITABILITY EXPERIMENTS

Next, we investigate the convergence of MMD as a Nash equilibrium solver. To induce convergence, in most of our experiments, we anneal the temperature of the regularization over time.

H.1 FULL FEEDBACK NASH CONVERGENCE DIPLOMACY

In our full feedback Nash convergence Diplomacy experiments, we used η " 1 10 , α t " 1 5 ? t . We show the results of the experiment in Figure 12 . Over short iteration horizons, we observe that CFR tends to outperform MMD. However, for longer horizons, we find that MMD tends to catch up with CFR. In game D, the qualitatively different behavior is likely to due the fact that the Nash equilibrium is a pure strategy, unlike the Nash equilibria of the first three games, which are mixed. < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 

Iterations

l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > MMD CFR 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 6 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 U 3 u / c p F 4 9 k d E c s v j t g L v p f r Z V 4 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r 8 X E M e v E Y w T w g W c P s p J M M m Z 1 d Z m a F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l h w b V z 3 2 8 m t r K 6 t b + Q 3 C 1 v b O 7 t 7 x f 2 D h o 4 S x b D O I h G p V k A 1 C i 6 x b r g R 2 I o V 0 j A Q 2 A x G t 1 O / + Y R K 8 0 g + m H G M f k g H k v c 5 o 8 Z K T c 9 9 T M 8 u J 9 1 i y S 2 7 M 5 B l 4 m W k B B l q 3 e J X p x e x J E R p m K B a t z 0 3 N n 5 K l e F M 4 K T Q S T T G l I 3 o A N u W S h q i 9 t P Z u R N y Y p U e 6 U f K l j R k p v 6 e S G m o 9 T g M b G d I z V A v e l P x P 6 + d m P 6 1 n 3 I Z J w Y l m y / q J 4 K Y i E x / J z 2 u k B k x t o Q y x e 2 t h A 2 p o s z Y h A o 2 B G / x 5 W X S O C 9 7 l f L F f a V U v c n i y M M R H M M p e H A F V b i D G t S B w Q i e 4 R X e n N h 5 c d 6 d j 3 l r z s l m D u E P n M 8 f R Z q O 4 g = = < / l a t e x i t > 10 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " k o a 5 o 4 p s B v s s 4 4 O X 3 2 6 n b 7 W S D b U = " > A A A B 7 X i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 8 m K o s e i F 4 8 V 7 A e 0 a 8 m m 2 T Y 2 m y x J V i h L / 4 M X D 4 p 4 9 f 9 4 8 9 + Y t n v Q 1 g c D j / d m m J k X J o I b i / G 3 V 1 h Z X V v f K G 6 W t r Z 3 d v f K + w d N o 1 J N W Y M q o X Q 7 J I Y J L l n D c i t Y O 9 G M x K F g r X B 0 M / V b T 0 w b r u S 9 H S c s i M l A 8 o h T Y p 3 U 9 P F D h i e 9 c g V X 8 Q x o m f g 5 q U C O e q / 8 1 e 0 r m s Z M W i q I M R 0 f J z b I i L a c C j Y p d V P D E k J H Z M A 6 j k o S M x N k s 2 s n 6 M Q p f R Q p 7 U p a N F N / T 2 Q k N m Y c h 6 4 z J n Z o F r 2 p + J / X S W 1 0 F W R c J q l l k s 4 X R a l A V q H p 6 6 j P N a N W j B 0 h V H N 3 K 6 J D o g m 1 L q C S C 8 F f f H m Z N M + q / n n 1 4 u 6 8 U r v O 4 y j C E R z D K f h w C T W 4 h T o 0 g M I j P M M r v H n K e / H e v Y 9 5 a 8 H L Z w 7 h D 7 z P H 9 G 3 j q U = < / l a t e x i t > 10 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S X D K 0 H W q c w Z U 3 h P 7 H y D u D V + f R R M = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r E f U W 9 O I x g n l A s o b Z S S c Z M j u 7 z M w K Y c l H e P G g i F e / x 5 t / 4 y T Z g y Y W N B R V 3 X R 3 B b H g 2 r j u t 5 N b W V 1 b 3 8 h v F r a 2 d 3 b 3 i v s H D R 0 l i m G d R S J S r Y B q F F x i 3 X A j s B U r p G E g s B m M b q d + 8 w m V 5 p F 8 M O M Y / Z A O J O 9 z R o 2 V m p 7 7 m J 5 d T 7 r F k l t 2 Z y D L x M t I C T L U u s W v T i 9 i S Y j S M E G 1 b n t u b P y U K s O Z w E m h k 2 i M K R v R A b Y t l T R E 7 a e z c y f k x C o 9 0 o + U L W n I T P 0 9 k d J Q 6 3 E Y 2 M 6 Q m q F e 9 K b i f 1 4 7 M f 0 r P + U y T g x K N l / U T w Q x E Z n + T n p c I T N i b A l l i t t b C R t S R Z m x C R V s C N 7 i y 8 u k c V 7 2 K u W L + 0 q p e p P F k Y c j O I Z T 8 O A S q n A H N a g D g x E 8 w y u 8 O b H z 4 r w 7 H / P W n J P N H M I f O J 8 / S i m O 5 Q = = < / l a t e x i t > 10 12 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 G 0 u 2 / x Y t R w R H d v 6 H I 7 5 1 q J + g N E = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Z D R I 9 B L x 4 j m A c k a 5 i d T J I h s 7 P r T K 8 Q l v y E F w + K e P V 3 v P k 3 T p I 9 a G J B Q 1 H V T X d X E E t h 0 H W / n Z X V t f W N z d x W f n t n d 2 + / c H D Y M F G i G a + z S E a 6 F V D D p V C 8 j g I l b 8 W a 0 z C Q v B m M b q Z + 8 4 l r I y J 1 j + O Y + y E d K N E X j K K V W p 7 7 k J 5 7 5 U m 3 U H R L 7 g x k m X g Z K U K G W r f w 1 e l F L A m 5 Q i a p M W 3 P j d F P q U b B J J / k O 4 n h M W U j O u B t S x U N u f H T 2 b 0 T c m q V H u l H 2 p Z C M l N / T 6 Q 0 N G Y c B r Y z p D g 0 i 9 5 U / M 9 r J 9 i / 8 l O h 4 g S 5 Y v N F / U Q S j M j 0 e d I T m j O U Y 0 s o 0 8 L e S t i Q a s r Q R p S 3 I X i L L y + T R r n k V U o X d 5 V i 9 T q L I w f H c A J n 4 M E l V O E W a l A H B h K e 4 R X e n E f n x X l 3 P u a t K 0 4 2 c w R / 4 H z + A L C D j x k = < / l a t e x i t > V C N g k u s G 2 4 E t m K F N A w E N o P R z d R v P q H S P J L 3 Z h y j H 9 K B 5 H 3 O q L F S y 3 M f 0 n O v M u k W i m 7 J n Y E s E y 8 j R c h Q 6 x a + O r 2 I J S F K w w T V u u 2 5 s f F T q g x n A i f 5 T q I x p m x E B 9 i 2 V N I Q t Z / O 7 p 2 Q U 6 v 0 S D 9 S t q Q h M / X 3 R E p D r c d h Y D t D a o Z 6 0 Z u K / 3 n t x P S v / J T L O D E o 2 X x R P x H E R G T 6 P O l x h c y I s S W U K W 5 v J W x I F W X G R p S 3 I X i L L y + T x k X J K 5 c q d + V i 9 T q L I w f H c A J n 4 M E l V O E W a l A H B g K e 4 R X e n E f n x X l 3 P u a t K 0 4 2 c w R / 4 H z + A L U S j x w = < / l a t e x i t > Game A Game B Game C Game D Figure 12 : MMD and CFR applied to diplomacy stage games for computing Nash equilibria.

H.2 BLACK BOX NASH CONVERGENCE DIPLOMACY

For our black box Nash convergence experiments, we compare against the "opponent on-policy" variant of Monte Carlo CFR (Lanctot et al., 2009) . In this variant, the two players alternate between an updating player and an on-policy player. The updating player plays off-policy according to a policy that provides sufficiently large support to each action (in our Diplomacy experiments we used a uniform policy). The advantage to this setup is that it guarantees that the updating player will receive bounded gradients, which is necessary for Monte Carlo CFR's convergence proof. In contrast, we show results for an on-policy Monte Carlo variant of MMD, despite the fact that this causes unbounded gradients. This is not a fair comparison in the sense that the same "opponent on-policy" setup is equally applicable to MMD and would keep the gradients bounded, whereas the "on-policy" version of Monte Carlo CFR does not converge. We made this decision because the on-policy Monte Carlo variant of MMD is simpler and more elegant. Nevertheless, we believe that the "opponent on-policy" version of MMD remains an interesting direction for future, and would very possibly yield faster convergence. We again investigated three ways of estimating Q-values. For our unbiased estimator with no baseline we used 15 , with averages across 30 runs and estimates of 95% confidence intervals computed from bootstrapping. < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t η t " 1 5 ? t , Iterations Exploitability 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 2 < l a t e x i t s h a _ b a s e = " x k f V C W L k S i T + r Z J m d p s w = " > A A A B i c b V B N S w M x E J t X V + V T C R b B U k t L X o s e v F Y w X A u Z s m m D k + y S Z I W y C a C I V / + Q N / + N X Y P v p g P H e D D P z g p g z b V z y l s b G t x R S v B d H e O T j o S R W i b R D x S v Q B r y p m k b c M M p Y U S w C T r v B D b z u U a R b J B z O L q S / w W L K Q E W w y y X M f a N y x a C B u W k A j l a w / L X Y B S R R F B p C M d a z N n K l W G E l p k G g a Y z L F Y q V G J B t Z u b p j C u M U B g p W K g h f p I s V C k I b K f A Z q J X v U z z + s n J r z U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L V k Q l W m B g b T m G K + v E t a p X r z b u X m T R H E c g H C B g y t o w h o A E J v A M r / D m C O f F e X c + l q F J h T w P n A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V 9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V 9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V 9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9  V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 For MMD, we observe that biased Q-value estimates generally perform best, followed by an unbiased estimate with baseline, followed by an unbiased estimate without baseline, except on game D, where the unbiased baseline performs similarly to biased Q-value estimates. We also find that CFR tends to follow this trend, though the difference between biased Q-value estimates and an unbiased baseline is less pronounced, except on game D, where the unbiased baseline performs poorly. Between MMD and CFR, CFR tends to perform better on an estimator-to-estimator basis in games A, B and C, though MMD is relatively competitive with CFR under biased Q-value estimates. For game D, we observe that this comparison is more favorable for MMD than the other games. M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y U N P u F g w U 0 R H Q I Q I a z 9 L X k L v p r 0 = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 L w 6 6 Z X K b s W d g S w T L y d l y F H v l b 6 6 / Z i l E U r D B N W 6 4 7 m J 8 T O q D G c C J 8 V u q j G h b E Q H 2 L F U 0 g i 1 n 8 3 O n Z B T q / R J G C t b 0 p C Z + n s i o 5 H W 4 y i w n R E 1 Q 7 3 o T c X / v E 5 q w m s / 4 z J J D U o 2 X x S m g p i Y T H 8 n f a 6 Q G T G 2 h D L F 7 a 2 E D a m i z N i E i j Y E b / H l Z d K 8 q H j V y u V 9 t V V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d

H.3 FULL FEEDBACK NASH CONVERGENCE EFGS

For our full feedback Nash convergence EFG experiments, we examined two variants of MMD. The first, which we call unweighted MMD, corresponds with the version tested in the AQRE experiments π t`1 ph i q9pπ t ph i qe ηtqπ t phiq q 1{p1`ηtαtq . The second, which we call weighted MMD, uses π t`1 ph i q9pπ t ph i qe P π t phiqηtqπ t phiq q 1{p1`P π t phiqηtαtq . In other words, it weights the stepsize of the update by the probability of reaching that information state under the current policy. We test this variant because it corresponds with a "determinized" version of black box sampling for temporally extended settings. For unweighted MMD, we used Noting again that the caveats about comparing on-policy MMD to opponent on-policy Monte Carlo CFR also apply here, we present the results in Figure 14 . Results are averaged across 30 runs and shown with 95% confidence intervals estimated from bootstrapping. As in the normal-form experiments, we find that Monte Carlo CFR generally outperforms MMD for unbiased gradient estimates with no baseline.  l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 6 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 v H r L 5 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t 9 B u m l l X T v c F V 4 g h b J m 1 E = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V V j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A F m a N n w = = < / l a t e x i t > Exploitability 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > Iterations 10 6 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 v H r L 5 3 9 B u m l l X T v c F V 4 g h b J m 1 E = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V V j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A F m a N n w = = < / l a t e x i t > MMD OS-MCCFR 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 V 1 W v V q 3 f 1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 6 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 v H r L 5 3 H.5 MOVING MAGNET Next, we investigate using a moving magnet, rather than an annealing temperature, to induce convergence to a Nash equilibrium. In the moving magnet setup, updates take the form π t`1 ph i q9rπ t ph i qρ t ph i q ηα e ηqπ t phiq s 1{p1`ηαq , where ρ t slowly trails behind π t . In our experiment, we used ρ t`1 ph i q9ρ t ph i q 1´η π t`1 ph i q η . 9 B u m l l X T v c F V 4 g h b J m 1 E = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V V j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I b K f A Z q y X v U z 8 z + s l J r z 2 U y b j x F B J F o v C h C M T o e x x N G S K E s O n l m C i m L 0 V k T F W m B g b T 8 m G 4 C 2 / v E r a F 1 W v V q 3 f 1 y q N m z y O I p z A K Z y D B 1 f Q g D t o Q g s I j O E Z X u H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A F m a N n w = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > MMD OS-MCCFR 4-Sided Liar's Dice 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H b 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y For each game, we used α " 1, η " 0.1, η " 0.05. We show the results in Figure 16 , compared against CFR and MMD with an annealing temperature (with the same hyperparameters as before). Encouragingly, we find that that moving the magnet behind the current iterate also appears to induce convergence. Furthermore, convergence may occur at a much faster rate than that which is induced by annealing the temperature.

H.6 MAXENT AND MINIMAXENT OBJECTIVES

Next, we examine the convergence properties of other related objectives. We consider two different types. One involves an information state entropy bonus, wherein αHpπ t ph i qq is added to the reward for player i for reaching information state h i . This corresponds with a maximum entropy objective in reinforcement learning (Ziebart et al., 2008) ; we call this objective MaxEnt. The second involves simultaneously giving an information state entropy bonus (like MaxEnt), while also penalizing the For Mujoco, we added a reverse KL penalty with a coefficient 1 {η " 0.1; we added an entropy bonus (Huang et al. (2022) do not use an entropy bonus) with a value of α " 0.0001. Otherwise, for both Atari and Mujoco, the hyperparameters were set to those selected by Huang et al. (2022) . We show the results again in Table 1 for convenience. The baseline results for PPO are copied directly from Huang et al. (2022) . The exact numbers should be interpreted cautiously as they are averaged over only three runs, leaving high levels of uncertainty. That said, the results in Table 1 provide evidence that MMD can perform comparably to PPO. But, even without looking at empirical results, the idea that a deep form of MMD can perform comparably to PPO should not be surprising, as MMD can be implemented in a way that resembles PPO in many aspects. 

J DEEP MULTI-AGENT REINFORCEMENT LEARNING EXPERIMENTS

For our deep multi-agent reinforcement learning experiments, we implemented MMD as a modification to PPO, as implemented by RLlib (Liang et al., 2018) . This involved changing the adaptive forward KL regularization to a constant reverse KL regularization and setting η t , α t according the following schedule η t " 0.05 c 10 million t , α t " 0.05 c 10 million t , where t is the number of time steps-not the number of episodes. Otherwise, we used the default hyperparameters. We also show results for PPO, using RLlib's default hyperparameters. We ran these implementations in self-play using RLlib's OpenSpiel environment wrapper, modified to work with information states, rather than observations. For NFSP (Heinrich & Silver, 2016) , we used the same hyperparameters as those found in the NFSP Leduc example in the OpenSpiel codebase. For the best response, we used the OpenSpiel's DQN best response code, without modifying any hyperparameters. We ran the best response for 10 million time steps and evaluated all match-ups over 2000 games (with each agent being the first-moving player in 1000). There are two caveats to consider in interpreting these experiments. First, it is likely that RLlib's default PPO hyperparameters are generally stronger than the default hyperparameters for NFSP in the OpenSpiel. In this respect, the results we present may be unfair to NFSP. Second, RLlib's OpenSpiel wrapper does endow agents with knowledge about which actions are legal-instead, if an illegal action is selected, the agent is given a small penalty and a random legal action is executed. In contrast, OpenSpiel's implementation of NFSP uses information about the legal actions to perform masking. In other words, MMD and PPO face a harder version of the game than NFSP faces. In this respect, the results we present are unfair to MMD and PPO. We ran five seeds of each algorithm and checkpointed the parameters after both 1 million time steps and 10 million time steps. Because these games are too large to compute exact exploitability, we show results for DQN (Mnih et al., 2015) instances trained to best respond to each agent. For our DQN implementation, we used OpenSpiel's rl_response implementation. We did not modify any hyperparameters and ran DQN for 10 million time steps in all cases. We show results for the final DQN agent evaluated over 2000 games (1000 with DQN moving first and 1000 with DQN moving second) rounded to two decimal places. We also include results for a bot that selects actions uniformly at random (Random) and a bot that determinisically selects the first legal action (Arbitrary). These results of this experiment are presented in Figure 22 . In the games, the return for a win is 1 and the return for a loss is -1; in Phantom Tic-Tac-Toe, it is also possible to tie, in which case the return is 0. Thus, an approximate exploitability of 1 would mean that DQN defeats the agent 100% of the time, whereas an approximate exploitability of 0 would mean that DQN ties in expected value against the agent. As might be expected, we observe that playing an arbitrary deterministic policy (purple line) is perfectly exploitable by DQN in both games. In contrast, while playing uniformly at random (red line) is also highly exploitable, it is less so because of the randomization. Among the three learning agents, one trend is that PPO with RLlib's default hyperparameters does not appear to decrease exploitability over time. This is not necessarily surprising, as RL algorithms do not generally converge in two-player zero-sum games. On the other hand, both NFSP with OpenSpiel hyperparameters and MMD exhibit clear downward trends over time. Again, this is not necessarily surprising, as both MMD and NFSP are designed with exploitability in mind. Among the learning agents, in terms of raw value, MMD exhibits substantially stronger performance than the baselines. Indeed, even after 1 million time steps, every seed of MMD is less exploitable than any seed of the baselines after either 1 million or 10 million time steps. In contrast, the learning agent baselines do not substantially outperform uniform random play in Phantom Tic-Tac-Toe and only NFSP after 10 million time steps substantially outperforms uniform random play in 3x3 Abrupt Dark Hex. We also show results for head-to-head matchups between the agents in Figure 23 . For all learning agents, we use the 5 seeds that were trained for 10 million time steps. For matchups between learning agents, we ran each seed of each agent against each other (for a total of 25 matchups) for 2000 games (1000 with each agent moving first) and rounded to the nearest two decimal places. For matchups between learning agents and bots, we ran each seed of the learning agent against the bot (for a total of 5 matchups) for 2000 games (1000 with each agent moving first) and rounded to the nearest two decimal places. We show the results in Figure 23 . Each learning algorithm's results are denoted by an x-axis label; the hue denotes the opponent-not the agent being evaluated. Because the games are zero-sum matchup results are negations of each other. For example, in the 3x3 Abrupt Dark Hex column, the orange boxplot (i.e., opponent=NFSP) above the MMD label is the negation of the blue boxplot (i.e., opponent=MMD) above the NFSP label. One observation from the results is that MMD outperforms the baselines and the bots uniformly across seeds. This is encouraging in the sense that having low approximate exploitability appears to lead to strong performance in head-to-head matchups. Like MMD, the NFSP seeds win head-to-head matchups against the bots. On the other hand, PPO exhibits much higher variance-it tends to lose against the bot selecting the first legal action in Phantom Tic-Tac-Toe, but tends to defeat it by the largest margin in 3x3 Abrupt Dark Hex. (Heinrich & Silver, 2016) and PSRO (Lanctot et al., 2017) , scales oracle-based approaches (Brown, 1951; McMahan et al., 2003) by using single-agent reinforcement learning as a subroutine to compute approximate best responses. While this class of methods is very scalable, it can require computing many best responses (McAleer et al., 2021) , making it very slow in some cases. MMD differs from this class in that it does not use a best response subroutine and in that it does not use averages over historical policies. Another class of methods, which includes deep CFR (Brown et al., 2019) and double neural CFR (Li et al., 2020) , is motivated by scaling CFR (Zinkevich et al., 2007) -the dominant paradigm in tabular settings-to function approximation. Unfortunately, the sampling variant of CFR (Lanctot et al., 2009) requires importance sampling across trajectories, making straightforward extensions to stochasticity (Steinberger et al., 2020) difficult to apply to games with long trajectories, though more recent extensions may make progress toward resolving this issue (Gruslys et al., 2021) . MMD differs from this class both in that it neither requires policy averaging nor importance sampling over trajectories.

L RELATIONSHIP TO KL-PPO AND MDPO

On the single-agent deep reinforcement learning side, MMD most closely resembles KL-PPO (Schulman et al., 2017) and MDPO (Tomar et al., 2020; Hsu et al., 2020) .foot_5 KL-PPO uses the policy loss function E t " πpa t | s t q π old pa t | s t q Ât `αHpπps t qq ´βKLpπ old ps t q, πps t qq ȷ , where Ât is an advantage function (a learned estimate of q πt ps t , a t q ´vπt ps t q). In expectation, the first term acts as xπ t ps t q, q πt ps t qy, which is the first term of MMD's loss function. The second term is the same entropy bonus as exists in MMD's loss function, using a uniform magnet with a negative entropy mirror map. However, unlike MMD, KL-PPO's KL regularization goes forward KLpπ old ps t q, πps t qq. In contrast, MMD's KL regularization goes backward KLpπps t q, π old ps t qq. Hsu et al. ( 2020) investigated modifying KL-PPO to use reverse KL regularization instead of forward KL in Mujoco and found that the two yielded similar performance. MDPO uses the policy loss function E t " πpa t | s t q π old pa t | s t q Ât ´βKLpπps t q, π old ps t qq ȷ , where Ât is the approximate advantage function (a learned estimate of q πt ps t , a t q ´vπt ps t q). In the context of a negative entropy mirror map and a uniform magnet, MDPO differs from MMD in that it does not necessarily include an entropy regularization term αHpπps t qq; in the case that it does include such an entropy term, MDPO and MMD coincide. MMD with a negative entropy mirror map and a uniform magnet takes the form E t " πpa t | s t q π old pa t | s t q Ât `αHpπps t qq ´βKLpπps t q, π old ps t qq ȷ , where β acts as an inverse stepsize.



This follows because Assumptions (3.2-3.3) imply g is strongly convex and hence ∇g is strongly monotone. This assumption is guaranteed in the QRE setting where g is the sum of dilated entropy. Note that this would also be possible with sequence-form MMD. Although the actions Ãi give an equivalent normal-form representation, many of the actions are redundant because actions taken at certain decision points may make other decision points unreachable. The reduced normal-form (a.k.a. reduced strategic form) removes duplicate actions by identifying redundant choices at future decision points that are unreachable(Nisan et al., 2007). Hereinafter we consider the reduced normal-form. 20,00040,000 0 20,000 40,000 0 20,000 40,000 0 20,000 40,000 Hsu et al. (2020) investigate MDPO under the name PPO reverse KL.



t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 Rw 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t > 10 1

1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q DL / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + pm c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m

y 7 y e M o w D G c w B l 4 c A U 1 u I M 6 N I D B C J 7 h F d 6 c x H l x 3 p 2 P e e u K k 8 8 c w R 8 4 n z 9 C k I 7 g < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " x v c H 9 C 5 T u L d x C 9 c D i e y 4 5 U + t o y

t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 4

y 7 y e M o w D G c w B l 4 c A U 1 u I M 6 N I D B C J 7 h F d 6 c x H l x 3 p 2 P e e u K k 8 8 c w R 8 4 n z 9 C k I 7 g < / l a t e x i t > 10 6

Figure 1: Solving for (A)QREs in various settings.

10< l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q

5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z

e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t > ↵ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " h 5 G W B U 8 x d b I o u E j 0 X / 2 r S 5 y 5 0

e 9 m b i f 1 4 7 N e G 1 P + E y S Q 2 T d P F Q m A p i Y j J L g P S 4 Y t S I s S V I F b e 3 E j p A h d T Y n A o 2 B G / 5 y 6 u k c V H 2 L s u V h 8 t S 9 T a L I w 8 n c A r n 4 M E V V O E e a l A H C k / w D K / w 5 o y c F + f d + V i M 5 p x s 5 x j + w P n 8 A Y o t k K Y = < / l a t e x i t > ↵ = 0.1 < l a t e x i t s h a 1 _ b a s e 6 4 = " F k J 7 u u 6 h z 6 z v j I p J e m K Y / 7 t G C q 0 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 i k o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K U J b J O W p 6 k a g K W e C t g w z n H a l o p B E n H a i 8 d 3 M 7

Figure 2: (top) Behavioral-form MMD with constant hyperparameters for various temperatures; (bottom) instances of behavioral-form MMD as Nash equilibria solvers, compared to CFR and CFR+.

Figure 4: A visualization of convergence in perturbed RPS.

Figure 5: Convergence of Euclidean MMD for the saddle point problem min xPR max yPR α 2 x 2 `px 1qpy ´1q ´α 2 y 2 .

Figure 6: Solving for normal-form QREs in Diplomacy stage games with full feedback.

t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q

t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z

r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 1

Figure 7: MMD, PU, and OMWU applied to Diplomacy stage games for QRE finding with black box sampling.

r l 6 X y n V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H R B W O 4 Q = = < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y U

s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t >

Figure 11: Solving for AQREs in EFGs.

t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c+ l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " >A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + he 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I

H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I

1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c+ l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " >A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + he 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f x i R R F B p C M d a 9 z w 3 N n 6 K l W G E 0 1 m p n 2 g a Y z L B I 9 q z V G J B t Z / O b 5 2 h M 6 s M U R g p W 9 K g u f p 7 I s V C 6 6 k I

H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I

1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c+ l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B

H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6

1 y r N m z y O I p z B O V y C B w 1 o w h 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f a 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6 K l W G E 0 3 l p k G g a Y z L F Y 9 q 3 V G J B t Z 8 u b p 2 j C 6 u M U B g p W 9 K g h f p 7 I s V C 6 5 k I b K f A Z q J X v U z 8 z + s n J r z 2 U y b j x F B J l o v C h C M T o e x x N G K K E s N n l m C i m L 0 V k Q l W m B g b T 8 m G 4 K 2 + v E 4 6 t a p X r z b u 6 5 X m T R 5 H E c 7 g H C 7 B g y t o w h 2 0 o A 0 E J v A M r / D m C O f F e X c+ l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B

H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t z H x r B a c + t u D r R K v I L U o E B r W P 0 a j C K S C C o N 4 V j r v u f G x k + x M o x w O q 8 M E k 1 j T K Z 4 T P u W S i y o 9 t P 8 1 j k 6 s 8 o I h Z G y J Q 3 K 1 d 8 T K R Z a z 0 R g O w U 2 E 7 3 s Z e J / X j 8 x 4 b W f M h k n h k q y W B Q m H J k I Z Y + j E V O U G D 6 z B B P F 7 K 2 I T L D C x N h 4 K j Y E b / n l V d K 5 q H u N + u V9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t d r W Q H Z b e A = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 m V l n o s e v F Y w X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0 9 U a R b J B z O L q S / w W L K Q E W w y y X M f 6 8 N y x a 2 6 C 6 B 1 4 u W k A j l a w / L X Y B S R R F B p C M d a 9 z 0 3 N n 6

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b gr f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 1

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b gr f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 1

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b gr f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 3

t e x i t s h a 1 _ b a s e 6 4 = " L Z N E U c K e 2 4 s A U o X N F O A 0 8 e J 9 j 8 g = " > A A A B 7 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B C 8 G H Y l Q Y 9 B L x 4 j m A c k a 5 i d d J I h s 7 P r z K w Q l v y E F w + K e P V 3 v P k 3 T p I 9 a G J B Q 1 H V T X d X E A u u j e t + O y u r a + s b m 7 m t / P b O 7 t 5 + 4 e C w o a N E M a y z S E S q F

t e x i t s h a 1 _ b a s e 6 4 = " r 6W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3

9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5

5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D

y 7 y e M o w D G c w B l 4 c A U 1 u I M 6 N I D B C J 7 h F d 6 c x H l x 3 p 2 P e e u K k 8 8 c w R 8 4 n z 9 C k I 7 g < / l a t e x i t > 10 1

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 1

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 1

Figure 13: MMD and OS-MCCFR applied to diplomacy stage games for computing Nash equilibria with black box sampling.

t e x i t s h a 1 _ b a s e 6 4 = " 1 R ZG l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s Wy 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c O 0 o 2 a < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " x 6 k f V C W 1 L 8 k S i T 5 + r Z J 5 1 m 9 d p s w = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k t L X o s e v F Yw X 5 A u 5 Z s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 D b z u 0

H N E c 6 L 8 + 5 8 L F o L T j 5 z D H / g f P 4 A E d q N n A = = < / l a t e x i t > 10 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " r 6W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 gM I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 69 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

t e x i t s h a 1 _ b a s e 6 4 = " r 6W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s Wy 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5

x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 2x2 Abrupt Dark Hex101 < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 R Z G l + d u R B 0 A c k s L p W n W Y d k S m B Y = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 69 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

t e x i t s h a 1 _ b a s e 6 4 = " r 6W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s Wy 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8

9 o 9 a 8 K e I o w w m c w j l 4 c A V N u I M W t I H A B J 7 h F d 4 c 4 b w 4 7 8 7 H o r X k F D P H 8 A f O 5 w 8 T X o 2 d < / l a t e x i t > 10 5

Figure 15: MMD compared against OS-MCCFR across standard OpenSpiel games with black box sampling.

Figure 16: Comparing a moving magnet to an annealing temperature.

Figure 21: Convergence with fixed hyperparameter values under a MiniMaxEnt objective.

Figure 22: Approximate exploitability experiments.

Figure 23: Head-to-head experiments.

We show the results in Figure1. We show NFG results in the top row of the figure compared against algorithms introduced by Cen et al. (2021), with each algorithm using the largest stepsize allowed by theory. All three algorithms converge exponentially fast, as is guaranteed by theory. The middle row shows results for QREs on EFG benchmarks. For Kuhn Poker and 2x2 Abrupt Dark Hex, we observe that MMD's divergence converges exponentially fast, as is also guaranteed by theory. For 4-Sided Liar's Dice and Leduc Poker, we found that Gambit had difficulty approximating the QREs, due to the size of the games. Thus, we instead report the saddle point gap (the sum of best response values in the regularized game), for which we observe linear convergence, as is guaranteed by Proposition D.6. The bottom row shows results for AQREs using behavioral form MMD (with η " α{10) on the same benchmarks, where we also observe convergence (despite a lack of guarantees). For further details, see Sections G.3 for the QRE experiments and Section G.4 for the AQRE experiments.

also attains convergence to KL-regularized equilibria in NFGs, our results differ in two ways: First, our results only handle the full feedback case, whereas Jacob et al. (2022)'s results allow for stochasticity. Second, our results give linear last-iterate convergence, whereas Jacob et al. (2022) only show Op log t {tq average-iterate convergence.

Supporting Lemmas and Propositions . . . . . . . . . . . . . . . . . . . . . . . D.2 Proof of Theorem 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.3 Equivalence between MMD and MD . . . . . . . . . . . . . . . . . . . . . . . D.4 Negative Entropy MMD Example . . . . . . . . . . . . . . . . . . . . . . . . . D.5 Euclidean MMD Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.6 Bounding the Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Note that ∇gpzq, ∇gpz ˚q, ∇ψpzq, ∇ψpz ˚q are all well-defined because z ˚P int dom ψ and). The first inequality follows since F is monotone and z ˚P sol VIpZ, F `α∇gq. The second inequality follows since g is 1-strongly convex relative to ψ and the last equality by Proposition D.1.Theorem 3.4. Let Assumptions 3.2 and 3.3 hold and assume the unique solution z ˚to VIpZ, F ὰ∇gq satisfies z ˚P int dom ψ. Then Algorithm 3.1 converges if η ď α L 2 and guarantees

4-Sided Liar's Dice and Leduc Poker, the games were too large for Gambit(McKelvey, Richard D., McLennan, Andrew M., and Turocy, Theodore L., 2016;Turocy, 2005) to compute the reduced normal-form QRE on our hardware. Therefore, we check the convergence of MMD by plotting the saddle point gap ξpx t , y t q of the min max problem given by Ling et al. (2018), ξpx

Atari and Mujoco results averaged over 3 runs, with standard errors.

Approximate exploitability and standard error in units of 10 ´2 over 5 seeds.

Head-to-head expected return in 3x3 Abrupt Dark Hex for column player and standard error in units of 10 ´2.

Head-to-head expected return in Phantom Tic-Tac-Toe for column player and standard error in units of 10 ´2.Average Policy Deep Reinforcement Learning for Two-Player Zero-Sum Games One class of deep reinforcement learning methods for two-player zero-sum games, which includes NFSP

7. ACKNOWLEDGEMENTS

We thank Jeremy Cohen, Chun Kai Ling, Brandon Amos, Paul Muller, Gauthier Gidel, Kilian Fatras, Julien Perolat, Swaminathan Gurumurthy, Gabriele Farina, and Michal Šustr for helpful discussions and feedback. This research was supported by the Bosch Center for Artificial Intelligence, NSERC Discovery grant RGPIN-2019-06512, Samsung, a Canada CIFAR AI Chair, and the Office of Naval Research Young Investigator Program grant N00014-22-1-2530.

annex

We used η " 1{2, inspired by Schmid et al. (2019) .We also investigated the use of biased Q-value estimates, as this is the setting that corresponds with function approximation. For this approach, we plugged in q, as computed above, instead of the exact Q-values q. < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 M P K J 4 T L U m E H K 2 U h Z x a d 3 K N q L A = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b C r E T 0 G v X i M Y B 6 Q r G F 2 0 k m G z M 4 u M 7 N C W P I R X j w o 4 t X v 8 e b f O E n 2 o I k F D U V V N 9 1 d Q S y 4 N q 7 7 7 e R W V t f W N / K b h a 3 t n d 2 9 4 v 5 B Q 0 e J Y l h n k Y h U K 6 A a B Z d Y N 9 w I b M U K a R g I b A a j 2 6 n f f E K l e S Q f z D h G P 6 Q D y f u c U W O l p u c + p m c X k 2 6 x 5 J b d G c g y 8 T J S g g y 1 b v G r 0 4 t Y E q I 0 T F C t 2 5 4 b G z + l y n A m c F L o J B p j y k Z 0 g G 1 L J Q 1 R + + n s 3 A k 5 s U q P 9 C N l S x o y U 3 9 P p D T U e h w G t j O k Z q g X v a n 4 n 9 d O T P / a T 7 m M E 4 O S z R f 1 E 0 F M R K a / k x 5 X y I w Y W 0 K Z 4 v Z W w o Z U U W Z s Q g U b g r f 4 8 j J p n J e 9 S v n y v l K q 3 m R x 5 O E I j u E U P L i C K t x B D e r A Y A T P 8 A p v T u y 8 O O / O x 7 w 1 5 2 Q z h / A H z u c P Q Q u O 3 w = = < / l a t e x i t > 10 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 V X X d b Y g Y I S B a 9 T 5 0 L d F h R w / L 0 o = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 v 5 B 9 f C o o 6 N E E d o m E Y 9 U L 8 C a c i Z p 2 z D D a S 9 W F I u A 0 2 4 w v c 3 8 7 h N V m k X y w c x i 6 g s 8 l i x k B J t M 8 t x H d 1 i t u X U 3 B 1 o l X k F q U K A 1 r H 4 N R h F J B J W G c K x 1 3 3 N j 4 6 d Y G U Y 4 n V c G i a Y x J l M 8 p n 1 L J R Z U + 2 l + 6 x y d W W W E w k j Z k g b l 6 u + J F A u t Z y K w n Q K b i V 7 2 M v E / r 5 + Y 8 N p P m Y w T Q y V Z L A o T j k y E s s f R i C l K D J 9 Z g o l i 9 l Z E J l h h Y m w 8 F R u C t / z y K u l c 1 L 1 G / f K + U W v e F H G U 4 Q R O 4 R w 8 u I I m 3 E E L 2 k B g A s / w C m + O c F 6 c d + d j 0 V p y i p l j + A P n 8 w c N T o 2 Z < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > 10 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " T I 4 h e S d J Q D L / q o R b i Z K L Q 8 9 9 a K 8 = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A b I n o M e v E Y w T w g W c P s Z J I M m Z 1 d Z n q F s O Q j v H h Q x K v f 4 8 2 / c Z L s Q R M L G o q q b r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5 i 7 o d 0 q M R A M I p W a n n u Y 3 p R m f a K J b f s z k F W i Z e R E m S o 9 4 p f 3 X 7 E k p A r Z J I a 0 / H c G P 2 U a h R M 8 m m h m x g e U z a m Q 9 6 x V N G Q G z + d n z s l Z 1 b p k 0 G k b S k k c / X 3 R E p D Y y Z h Y D t D i i O z 7 M 3 E / 7 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 G H u / N M D M v S A T X x n W / n Z X V t f W N z c J W c X t n d 2 + / d H D Y 1 H G q G D Z Y L G L V D q h G w S U 2 D D c C 2 4 l C G g U C W 8 H o d u q 3 n l B p H s s H M 0 7 Q j + h A 8 p A z a q z U 8 t z H 7 N y b 9 E p l t + L O Q J a J l 5 M y 5 K j 3 S l / d f s z S C K V h g m r d 8 d z E + B l V h j O B k 2 I 3 1 Z h Q N q I D 7 F g q a Y T a z 2 b n T s i p V f o k j J U t a c h M / T 2 R 0 U j r c R T Y z o i a o V 7 0 p u J / X i c 1 4 b W f c Z m k B i W b L w p T Q U x M p r + T P l f I j B h b Q p n i 9 l b C h l R R Z m x C R R u C t / j y M m l e V L x q 5 f K + W q 7 d 5 H E U 4 B h O 4 A w 8 u I I a 3 E E d G s B g B M / w C m 9 O 4 r w 4 7 8 7 H v H X F y W e O 4 A + c z x 8 + A Y 7 d < / l a t e x i t > Unbiased baseline ↵ = 0.05 < l a t e x i t s h a 1 _ b a s e 6 4 = " h 5 G W B U 8 x d b I o u E j 0 X / 2 r S 5 y 5 0 h A = " > A A A B 9 H i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V i 1 6 E o h e P F e w H t E u Z T b N t a D a 7 J t l C K f 0 d X j w o 4 t U f 4 8 1 / Y 9 r u Q V s f D P N 4 b 4 Z M X p A I r o 3 r f j u 5 t f W N z a 3 8 d m F n d 2 / / o H h 4 1 N B x q i i r 0 1 j E q h W g Z o J L V j f c C N Z K F M M o E K w Z D O 9 m f n P E l O a x f D T j h P k R 9 i U P O U V j J b + D I h k g u S F u 2 a 1 0 i y X b 5 i C r x M t I C T L U u s W v T i + m a c S k o Q K 1 b n t u Y v w J K s O p Y N N C J 9 U s Q T r E P m t b K j F i 2 p / M j 5 6 S M 6 v 0 S B g r W 9 K Q u f p 7 Y 4 K R 1 u M o s J M R m o F e 9 m b i f 1 4 7 N e G 1 P + E y S Q 2 T d P F Q m A p i Y j J L g P S 4 Y t S I s S We show the results of the experiment if Figure 8 . The column shows the temperature for the QRE. The y-axis shows the KL divergence to the corresponding logit-QRE. The x-axis shows the number of iterations. For each algorithm, the step size at iteration t was set to be equal to the maximal step size for which there exists an exponential convergence guarantee divided by 10 ? t. Each line is an average over 30 runs. The bands depict estimates of 95% confidence intervals computed using bootstrapping. Overall, we find that both using unbiased baselines and biased Q-value estimates appears to improve convergence speed. 

G.3 FULL FEEDBACK QRE CONVERGENCE EFGS

/ s P G Z 7 0 y h W / 6 s + A l g n O S Q V y 1 H v l r 2 5 f 0 T R m 0 l J B j O l g P 7 F B R r T l V L B J q Z s a l h A 6 I g P W c V S S m J k g m 1 0 7 Q S d O 6 a N I a V f S o p n 6 e y I j s T H j O H S d M b F D s + h N x f + 8 T m q j q y D j M k k t k 3 S + K E o F s g p N X 0 d 9 r h m 1 Y u w I o Z q 7 W x E d E k 2 o d Q G V X A h 4 8 e V l 0 j y r 4 v P q x d 1 5 p X a d x 1 G E I z i G U 8 B w C T W 4 h T o 0 g M I j P M M r v H n K e / H e v Y 9 5 a 8 H L Z w 7 h D 7 z P H 9 M 8 j q Y = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " F k J 7 u u 6 h z 6 z v j I p J e m K Y / 7 t G C q 0 = " > A A A B 8 3 i c b V B N S 8 N A E N 3 U r 1 q / q h 6 9 L B b B U 0 i k o h e h 6 M V j B f s B T S i T 7 a Z d u t k s u x u h h P 4 N L x 4 U 8 e q f 8 e a / c d v m o K 0 P B h 7 v z T A z L 5 K c a e N 5 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e t X W a K 

Saddle

Point Gap + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 n V F j 0 W v X i s Y D + g X U s 2 z b a h S X Z J s k J Z + h e 8 e F D E q 3 / I m / / G b L s H b X 0 w 8 H h v h p l 5 Q c y Z N q 7 7 7 R T W 1 j c 2 t 4 r b p Z 3 d v f 2 D 8 u F R W 0 e J I r R F I h 6 p b o A 1 5 U z S l m G G 0 2 6 s K B Y B p 5 1 g c p v 5 n S e q N I v k g 5 n G 1 B d 4 J F n I C D a Z 5 L m P l 4 N y x a 2 6 c 6 B V 4 u W k A j m a g / J X f + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 V 3 b 2 3 L x 4 Q 4 / 4 2 O I z r E = " > A A A B 7 n i c b V B N S 8 N A E J 3 4 W e t X 1 a O X x S J 4 s S R S 0 W P R i 8 c K 9 g P a W D b b S b t 0 s w m 7 G 6 G E / g g v H h T x 6 u / x 5 r 9 x 2 + a g r Q 8 r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " > + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > r q 7 g l g K g 6 7 7 7 e T W 1 j c 2 t / L b h Z 3 d v f 2 D 4 u F R 0 0 S J Z r z B I h n p d k A N l 0 L x B g q U v B 1 r T s N A 8 l Y w v p 3 5 r S e u j Y j U A 0 5x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 + l q 0 F J 5 8 5 h T 9 w P n 8 A E F a N m w = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " e D A L R G L K 8 N K e 0 9 H e N W T 1 5 V Y X l a s = " >< l a t e x i t s h a 1 _ b a s e 6 4 = " r 6 W J E Q W a K T S P 9 f E d G e D r j m 8 S 9 X k = " > A A A B 6 3 i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m k o s e i F 4 8 V 7 A e 0 s W y 2 m 3 b p 7 i b s b o Q S + h e 8 e F D E q 3 / I m / / G T Z q D t j 4 Y e L w 3 w 8 y 8 I O Z M G 9 f 9 d k p r 6 x u b W + X t y s 7 u 3 x O g o N r P x U q T p A r t l g 0 S C T B i M x + J 3 2 h O U M 5 s Y Q y L e y t h I 2 o p g x t Q g U b g r f 8 8 i p p V s p e t X x 5 X y 3 V b r I 4 8 n A C p 3 A O H l x B D e 6 g D g 1 g M I Z n e I U 3 J 3 Z e n H f n Y 9 G a c 7 K Z Y / g D 5 / M H P 4 a O 3 g = = < / l a t e x i t > 10 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " t 0 + l q 0 F J 5 8 5 h T 9 w P n 8 A F O K N n g = = < / l a t e x i t > Exploitability 10 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " S J U N D X P j 9 We show the results of our experiments in Figure 14 . We find that both weighted MMD and unweighted MMD exhibit convergent behavior. Furthermore, they converge at rates comparable with CFR on average across the games.

H.4 BLACK BOX NASH CONVERGENCE EFGS

For our black box Nash convergence EFG experiments, we used the Monte Carlo CFR implementation in OpenSpiel (Lanctot et al., 2019) , which uses an update policy with a 0.4 weight on the current policy and a 0.6 weight on the uniform policy. For MMD, we used the sampling version of weighted MMD, meaning that the information states touched during the trajectory are updated with the full stepsize, while information not touched during the trajectory are not updated. player with the opponent's information state entropy. This second approach can be viewed as a modification of the first approach that makes the game zero-sum. It is the objective that was examined in Pérolat et al. (2021) . We call this objective MiniMaxEnt.For each algorithm, we used We show the results in Figure 17 . We find that MMD exhibits convergent behavior with each of the objectives.

H.7 EUCLIDEAN MIRROR MAP OVER LOGITS

Next, we examine an instantiation of MMD that optimizes the logits using a Euclidean mirror map (ψ " 1 2 } ¨}2 2 ), as discussed in Section D.5, rather than reverse KL regularization. The update rule for this approach is given by z t`1 ph i q " arg max z x∇ w E a"softmax(w) q πt ph i , aq| w"ztphiq , zy ´α 2 ∥z ´ζph i q∥ 2 ´1 2η ∥z ´zt ph i q∥ 2 where π t ph i q " softmaxpz t ph i qq and ζ is the magnet. The closed form is z t`1 ph i q " z t ph i q `η∇ w E a"softmaxpwq q πt ph i , aq| w"ztphiq `αηζph i q 1 `αη .We test the convergence of Euclidean MMD for Leduc poker, usingWe show the results in Figure 18 . We find that Euclidean MMD also exhibits convergence behavior in Leduc poker. However, convergence may be slower than the negative entropy variant. < l a t e x i t s h a 1 _ b a s e 6 4 = " G q x b S p P D H 4 S 6 n I / 4 P t H.8 MINIMAXENT EXPLOITABILITY WITH FIXED PARAMETERS Next, we examine using MMD for the purposes of computing MiniMaxEnt equilibria. For each temperature α, we used η " α{10. We show the results in Figure 19 . In the figure, convergence is measured in terms of exploitability in the entropy regularized game. Similarly to our AQRE results, we find that, althought convergence is non-monotonic, the empirical rate appears roughly linear over long time scales. This is the first empirical demonstration of convergence in MiniMaxEnt exploitability in EFGs. 

