MODELING CONTENT CREATOR INCENTIVES ON ALGORITHM-CURATED PLATFORMS

Abstract

Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices-e.g., non-negative vs. unconstrained factorization-significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) predeployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.

1. INTRODUCTION

In 2018, Jonah Peretti (CEO, Buzzfeed) raised alarm when a Facebook main feed update started boosting junk and divisive content (Hagey & Horwitz, 2021) . In Poland, the same update caused an uptick in negative political messaging (Hagey & Horwitz, 2021) . Tailoring content to algorithms is not unique to social media. For example, some search engine optimization (SEO) professionals specialize on managing impacts of Google Search updates (Marentis, 2014; Dennis, 2016; Shahzad et al., 2020; Patil et al., 2021; Goodwin, 2021) . While motivations for adapting content range from economic to socio-political, they often translate into the same operative goal: exposure maximization. We study how algorithms affect exposuremaximizing content creators. We propose a novel incentive-based behavior model called an exposure game, where producers compete for a finite user attention pool by crafting content ranked highly by a given algorithm (Section 1.1). When producers act strategically, a steady state-Nash equilibrium (NE)-may be reached, with no one able to unilaterally improve their exposure (utility). The content produced in a NE can thus be interpreted as what the algorithm implicitly incentivizes. We focus on algorithms which model user preferences as an inner product of d-dimensional user and item embeddings, and rank items by the estimated preference. Section 2 presents theoretical results on the NE induced by these algorithms. We identify cases where algorithmic changes seemingly unconnected to producer incentives-e.g., switching from non-negative to unconstrained embeddings-determine whether there are zero, one, or multiple NE. The character of NE is also affected by the level of algorithmic exploration. Perhaps counter-intuitively, we show that high levels of exploration incentivize broadly appealing content, whereas low levels lead to specialization. In Section 3, we explore how creator behavior models can facilitate a pre-deployment audit. Such an audit could be particularly useful for assessing the producer impact of algorithmic changes, which is hard to measure by A/B testing for two important reasons: (1) producers cannot be easily randomized to distinct treatment groups, and (2) there is often a delay between deployment and content adaptation. Our hope is that this new style of auditing will enable detection of misalignment between the induced and desired incentives, and thus flag issues to either immediately address, or monitor in content filtering and moderation. For demonstration, we execute a pre-deployment audit on the MovieLens and LastFM datasets using the exposure game behavior model, and matrix factorization based recommenders. We find a strong dependence between algorithmic choices like embedding dimension and level of exploration, and properties of the incetivized content such as diversity (confirming our theory), and targeting of gender-based user and creator groups.

1.1. SETTING AND THE EXPOSURE GAME INCENTIVE MODEL

We assume there is a fixed recommender system trained on past data, and a fixed population of users (consumers). Together, these induce a demand distribution P c which represents typical traffic on the platform over a predefined period of time. Content is created by n ∈ N producers who try to maximize their expected exposure (utility). Denoting consumers by c ∼ P c , an item created by the i th producer by s i (strategy), s := (s i ) i∈ [n] , and s \i := (s j ) j̸ =i , we define (expected) exposure as the proportion of the "user attention pool" captured by the i th producer u i (s) = u i (s i , s \i ) := E c∼Pc [1{c is exposed to s i }] ⋆ = E c∼Pc [p i (c)] , with p i (c) ≥ 0 the probability that the algorithm exposes c to s i rather than any s \i . As common in game theory, we can extend from deterministic single item strategies to stochastic multi-item strategies s i ∼ P i for some distribution P i . This extension is discussed in more detail in Section 2. The assumption that E[1{c is exposed to s i }] ⋆ = E[p i (c)] does not explicitly model interactions not mediated by the algorithm (e.g., YouTube videos linked to by an external website). This may be a reasonable approximation for infinite feed platforms (e.g., Twitter, Facebook, TikTok) where most consumers scroll through items in the algorithm-defined order, and search engines (e.g., Google, Bing) where first-page bias is well documented (Craswell et al., 2008) . While similar assumptions are common in the literature (e.g., Li et al., 2010; Chen et al., 2019; Ben-Porat et al., 2020; Curmei et al., 2021) , alternative interaction models are an important future research direction. Unlike previous work (Section 1.2), we focus on the popular class of factorization-based algorithms. These models rank items by a score estimated by the inner product of user and item embeddings c, s i ∈ R d . The larger this score, the higher the probability of exposure, which we model as p i (c) = exp(τ -1 ⟨c, s i ⟩) n i ′ =1 exp(τ -1 ⟨c, s i ′ ⟩) = softmax τ -1 ⟨c, s i ′ ⟩ n i ′ =1 i , where τ ≥ 0 is a temperature parameter which controls the spread of exposure probabilities over the top scoring items. When τ = 0 (i.e., hardmax), these probabilities correspond to top-1 recommendation or absolute first-position bias. Taking τ > 0 models the effects of ranked position, injected randomness for exploration, and can partially adjust for user randomness and other factors which make top-ranked items receive more but not all of the traffic. While an approximation in some settings, Equation (2) has been directly used, e.g., by YouTube (Chen et al., 2019) . We emphasize that we make no assumption on how the embeddings are obtained. Our conclusions thus apply equally to classical matrix factorization and deep learning-based systems. We are now ready to formalize exposure games, an incentive-based model of creator behavior. Definition 1. An exposure game consists of an embedding dimension d ∈ N, a demand distribution P c ∈ P(R d ), and n ∈ N producers, each of whom chooses a strategy s i ∈ S d-1 = {v ∈ R d : ∥v∥ = 1}, to maximize their utility u i (s) = E c∼Pc [p i (c)] with p i (c) as in Equation (2) for a given τ ≥ 0. We restrict items s i to the unit sphere S d-foot_0 . A norm constraint is necessary as otherwise exposure could be maximized by inflating ∥s i ∥ → ∞, which is not observed in practice. 1 We distinguish non-negative games where all embeddings lie in the positive orthant; this includes algorithms ranging from TF-IDF, bag-of-words, to non-negative matrix factorization (Lee & Seung, 1999), topic models (Blei et al., 2003) , and constrained neural networks (Ayinde & Zurada, 2017). Definition 2. A non-negative exposure game is an exposure game where the support of P c is restricted to the positive orthant, i.e., P c ({c ∈ R d : c j ≥ 0 , ∀j ∈ [d]}) = 1. We assume all producers are rational, omniscient, and fully control placement of s i in S d-1 . These assumptions are standard in both machine learning and economics literature, including in the related facility location games (see Section 1.2). They often provide a good first order approximation, and an important basis for studying the subtleties of real-world behavior. Full control is perhaps the least realistic, since producers can modify content features, but they often do not know how these changes affect the content embedding. This assumption has a significant advantage though: it abstracts away an explicit model of producer actions (cf. the variety of SEO techniques). Appropriateness of rationality and complete information are then context-dependent; they may be respectively reasonable in environments where strong profit motives or user profiling tools are common. However, investigating alternatives to each of the above assumptions is an important direction of future work. Figure 2 : YouTube revenue streams incentivizing exposure maximization (Ørmen & Gregersen, 2022) . Box 1: How our assumptions map onto YouTube (YT) as an illustrative example. On YT, a strategy s i is an embedding of a video, with creators able to produce multiple videos (mixed strategy s i ∼ P i ). Rational behavior: YT creators receive income proportional to their view numbers (Figure 2 ), which motivates exposure maximization. Most creators do not earn significant income, but the majority of traffic is driven by only a few popular and high-earning creators (Cheng et al., 2008) . This motivates focus on these few producers and their strategic behavior. Complete information and full-control. YT creators cannot directly manipulate the embeddings of their videos s i , or observe the user embeddings. However, popular creators have a myriad of analytic tools at their hand, with information about views, demographics (e.g., gender, age, region), acquisition channels, drivers of engagement, competition and more. They can also observe and adopt behaviors of other creators. Taking the strong monetary incentives into account, motivated creators will actively optimize their exposure using trial-and-error, making complete information and full-control an imperfect yet not unreasonable model of their behavior.

1.2. RELATED WORK

Most relevant to our setup are works on the incentives of exposure-maximizing creators induced by recommender and retrieval systems (Ben-Porat et al., 2020; Raifer et al., 2017; Ben-Basat et al., 2017; Ben-Porat & Tennenholtz, 2018; Ben-Porat et al., 2019b; a) . Interesting aspects of these works which we omit include (i) repeated interactions (Ben-Porat et al., 2020; Raifer et al., 2017; Ben-Porat et al., 2019b) , (ii) user welfare (Ben-Porat et al., 2020; Ben-Basat et al., 2017; Ben-Porat & Tennenholtz, 2018; Ben-Porat et al., 2019a) , and (iii) incomplete information (Raifer et al., 2017) . The most important distinction of our approach is that the above works constrain creators to a predefined finite item catalog. This excludes the popular factorization-based algorithms-ranging from standard matrix factorization (Koren et al., 2009) to (deep) two-tower architectures (Huang et al., 2013; Yi et al., 2019) -whose continuous embedding space translates into an infinite number of possible items. The only exception is (Ben-Porat et al., 2019a) where items are represented by [0, 1] scalars, which is equivalent to the special case of two-dimensional non-negative exposure games. Continuous embedding spaces were recently studied in (Mladenov et al., 2020; Zhan et al., 2021) , but neither studies producer incentives or competition. Mladenov et al. (2020) consider producers who decide whether to stay or leave the platform if their exposure is too low. Zhan et al. (2021) study design of recommender systems which optimize for both user and producer utility. Concurrently but independently, Jagadeesan et al. (2022) study a model equivalent to hardmax nonnegative exposure games, except the ∥s i ∥ = 1 constraint is replaced by a production cost, yielding u i (s) = E[p i (c)] -∥s i ∥ β for some norm ∥ • ∥ and β ≥ 1 (higher norm interpreted as higher quality). The authors investigate how the cost function influences the economic phenomena exhibited by NE, from formation of "genres" (multiple directions with non-zero probability), to the possibility of realizing positive profits (utility). In contrast, we investigate how NE depend on algorithmic and environmental factors (non-negativity, exploration, dependence of exposure on ranking), and propose an algorithmic audit which leverages the creator model. While taking β → ∞ in the Jagadeesan et al.'s cost recovers our unit norm constraint, understanding the NE behavior at the limit remains a subject of future work (e.g., pure NE exist only in our setup). Our works are thus largely complementary. Literature on adaptive behavior in the presence of a prediction algorithm is also relevant (Hardt et al., 2016; Kleinberg & Raghavan, 2020; Perdomo et al., 2020; Jagadeesan et al., 2021) . The social impact and potential disparate effects of strategic adaptation have been analyzed in (Milli et al., 2019; Hu et al., 2019; Liu et al., 2020) . Most relevant for us is a recent paper by Liu et al. (2022) which studies strategic adaptation in the context of finite resources (e.g., number of accepted college applicants). Unlike us, the authors assume a single score for each competitor, who can pay cost to improve it. A principal then designs a reward function which allocates the finite resource based on the scores, and the authors study how different choices affect various notions of welfare. The preliminary results on multidimensional scores (appendix B) assume the scores and individual improvements are independent, whereas our scores-⟨c, s i ⟩ for each c-imply complex dependence and trade-offs. Finally, our proposed methods for auditing recommender and information retrieval systems belong to a rapidly growing algorithm auditing toolbox. We focus on understanding producer incentives caused by a known algorithm. Thus, we complement prior work that aims to audit these systems based upon: the degree of consumer control (Curmei et al., 2021) , fairness (Do et al., 2021) , compliance with regulations (Cen & Shah, 2021) , and dynamical behavior in simulations (Krauth et al., 2020; Lucherini et al., 2021) or deployed systems Haroon et al. (2022) .

2. EQUILIBRIA IN EXPOSURE GAMES

This section presents theoretical results on incentives in exposure games. We focus on the impact of the recommender/information retrieval model on the competitive equilibria. Throughout, we find that one of the most important factors determining existence and character of equilibria is the temperature τ (see Equation ( 2)). We thus distinguish the softmax (τ > 0) and the hardmax (τ = 0) case. In competitive settings, a key question is whether there are equilibria in which players are satisfied with their strategies, as otherwise there may be never-ending oscillation in search for better outcomes. We thus consider several solution concepts (i.e., definitions of equilibria) related to NE. A pure NE (PNE) is a point in strategy space s NE ∈ (S d-1 ) n where no player i can increase their utility by unilaterally deviating from s NE i ∈ S d-1 . In other words, no content producer can increase their exposure by modifying their content. Mixed NE (MNE) refer to the setting where players are allowed to choose randomized (mixed) strategies P i ∈ P(S d-1 ). Rather than selecting a single piece of content, a creator following a mixed strategy samples s i ∼ P i . Alternative interpretation is that producers create multiple items, splitting their time/budget proportionally to the P i -probabilities. In later sections, we explore the weaker solution concepts of ϵ-NE, local NE (LNE), and their combination ϵ-LNE. An ϵ-NE is an approximate NE where no producer can unilaterally increase their utility by more than ϵ (NE are "0-NE"). LNE are analogous to local optima: points where no player benefits from small deviations from their strategy. The approximate and local perspectives are relevant when deploying local search algorithms to find NE numerically (Section 3). Exposure games are symmetric, meaning that any permutation of strategies forming an equilibrium produces another equilibrium. Our statements on the existence and uniqueness of equilibria hold up to player permutation. All proofs for the results in this section are presented in the appendix.

2.1. PURE AND MIXED NASH EQUILIBRIA

We begin by characterizing the existence of pure and mixed NE in general exposure games. Theorem 1. Every exposure game has at least one mixed Nash equilibrium. A key property of softmax games is that the utilities u i are continuous in s. This, and the compactness of the strategy space S d-1 , guarantees existence of MNE (Glicksberg, 1952, section 2). In the hardmax case (τ = 0), we can show that MNE are guaranteed to exist through a direct application of proposition 4 due to Simon (1987). The producer utilities u i are not differentiable in the hardmax case though, which means we cannot use gradient information to find NE as in the softmax case. The only procedure we know for finding NE in hardmax games requires solving the hitting set problem which is NP-complete (Dasgupta et al., 2008) . See Appendix B for further discussion. We now turn to existence of pure NE, which is the setting where creators strategically design a single piece of content. Unlike MNE, PNE are not guaranteed to exist even in the softmax case. Theorem 2. PNE need not exist in either the hardmax (τ = 0) or softmax (τ > 0) exposure games. Figure 3A illustrates the non-existence result. The counter-example holds even for n = 2 players and planar (d = 2) strategies. A reader familiar with classic PNE results may ask if PNE would appear if we relaxed the S d-1 strategy space to the convex B d = {v : ∥v∥ ≤ 1} (Glicksberg, 1952; Debreu, 1952; Fan, 1952) . This is not true as the exposure utility is not quasi-concave (Figure 3B&C ). We now move to non-negative exposure games (Definition 2). For n = d = 2, non-negative hardmax exposure games are equivalent to Hotelling games (Hotelling, 1929) , and more generally to facility location games on a line (Ben-Porat et al., 2019a; Procaccia & Tennenholtz, 2013) . The next proposition lists several special cases in which we understand existence and character of PNE. Proposition 1. A PNE always exists in n = d = 2 non-negative hardmax games, but may not without non-negativity or when d > 2. For n = 2 non-negative softmax games with ĉ : Figure 3D illustrates a 4-player non-negative exposure game. Depending on the temperature, we observe either the collapsed s i = c (large τ ), or what we term "protective positioning" (small τ ). In Figure 3D , players place their strategies between a consumer and the next closest producer. Figure 3E illustrates protective positioning for a higher number of consumers and n = 3. Here, consumers are roughly clustered around three centers (blue dots). The producer strategies are close to these centers, but again offset towards the most contested consumers. = 1 n (1 -1 n ) E[c] ̸ = 0, 2.2 ϵ-NASH EQUILIBRIA While existence of NE is not guaranteed, the situation changes when we adopt the weaker solution concept of ϵ-NE, in which no producer can unilaterally increase their utility by more than ϵ. The existence and character of such equilibria strongly depends on the temperature τ . When τ = ∞, exposure is equally likely p i (c) = 1 n for all i and c regardless of the adopted strategies. Thus, every strategy profile is an NE. Considering a sequence of increasing (τ i ) i≥1 , we can therefore argue that the limit of any convergent sequence of NE indexed by τ is a NE at τ = ∞. Interestingly, Theorem 3 shows that a sufficiently large but finite τ > 0 is sufficient for existence of ϵ-(P)NE. The result is constructive, showing that the ϵ-PNE is parallel to the average consumer embedding. Theorem 3. For any ϵ > 0 and P c ∈ P(R d ) with compact support and E[c] ̸ = 0, ∃τ 0 > 0 s.t. s 1 = . . . = s n = c is an ϵ-PNE for all τ ≥ τ 0 . Moreover, for all τ ≥ τ 0 , the smallest ϵ τ for which c is an ϵ τ -PNE satisfies ϵ τ ≤ ϵ τ . If also ϵ < ∥ĉ∥, then the set of better-responses to c Ψ(c) := {v ∈ S d-1 : u 1 (v, c, . . . , c) ≥ u 1 (c, c, . . . , c)} , (3) is a subset of B d δ (c) = {v : ∥v -c∥ ≤ δ} with δ = 2ϵ/(∥ĉ∥ -ϵ), and δ → 0 as τ → ∞. This result shows that all ϵ-improvements concentrate near the consumer average ϵ-PNE as τ → ∞. Additionally, the "consumer symmetry" ∥ĉ∥ = 1 n (1 -1 n ) ∥E[c]∥ determines how quickly δ → 0. When consumers are spread approximately symmetrically w.r.t. the origin, the degenerate equilibrium appears only for large τ . However, smaller τ are sufficient for more directionally concentrated P c . A high number of producers also slows the concentration as the appeal of u i (c, . . . , c) = 1 n decreases with n. We conclude with a corollary based on our development so far. Corollary 1. There is a fixed ϵ 0 > 0 and a demand distribution P c which-depending on the chosen τ -induce zero, one, multiple, or infinitely many ϵ-NE for all ϵ ≤ ϵ 0 . Corollary 1 underscores the sensitivity of exposure games to the temperature parameter τ , with uniformly homogeneous content at one end (high τ ), and potentially persistent oscillation behavior in competition when no NE exist (low τ ). A higher τ > 0 can be a result of algorithmic exploration (Chen et al., 2019; Cesa-Bianchi et al., 2017; Lattimore & Szepesvári, 2020) , which is provably necessary for optimal performance in static environments (Lattimore & Szepesvári, 2020) . In contrast, our results show that in environments with strategic actors, exploration may incentivize content which is uniform and broadly appealing rather than diverse. This may contradict the intuition that more exploration should lead to greater content diversity due to the higher exposure of niche content. One way to understand this result is the tension between randomization and the ability of niche creators to reach their audience: producers may be discouraged from creating niche content when the algorithm is exploring too much (τ high), and encouraged to mercilessly seek and protect their own niche when the algorithm performs little exploration (τ low). When the algorithm captures user preferences well, exploration is typically thought of as having negative impact on user experience through immediate reduction in quality of service as a result of suboptimal recommendations. However, the above results show secondary long-term effects.

2.3. LOCAL NASH EQUILIBRIA

In a local NE, each s i is optimal on some of its neighborhood within the embedding space. Sometimes motivated as a form of bounded rationality, LNE can often be found by local search algorithms (e.g., Mazumdar et al., 2019) . Since our motivation in studying exposure games is ultimately better system understanding and audits, we are particularly interested in these algorithmic benefits. Practical first-order algorithms for identifying LNE operate analogously to gradient descent, implying they may terminate in critical points that are not LNE. Unlike NE, critical points always exist. Proposition 2. Every τ > 0 exposure game with E[c] ̸ = 0 has a critical point at s 1 = . . . = s n = c. As we have seen, s 1 = . . . = s n = c may be an equilibrium (Proposition 1). To distinguish LNE from mere critical points, we use the Riemannian second derivative test, treating S d-1 as a Riemannian submanifold of R d as usual. For background, see (Boumal, 2022, sections 3 & 5) . Definition 3 (Boumal, 2022, lemma 5.41 ). A point s in strategy space satisfies the second derivative test if ∀i (1) the Riemannian gradient (Is i s ⊤ i )∇ si u i (s) is zero, and (2) the Riemannian Hessian (I -s i s ⊤ i ) ∇ 2 si u i (s) (I -s i s ⊤ i ) -⟨s i , ∇ si u i (s)⟩(I -s i s ⊤ i ) , is strictly negative definite in the subspace perpendicular to s i . This condition is sufficient but not necessary for a critical point to be an LNE. LNE which satisfy Definition 3 are termed differentiable NE (Ratliff et al., 2016; Balduzzi et al., 2018) . The distinction is similar to that between the flat minimum of x 4 at zero the more well-behaved x 2 .

3. PRE-DEPLOYMENT AUDIT OF STRATEGIC CREATOR INCENTIVES

Beyond regularly retraining on new data, online platforms continuously roll out algorithm updates. While A/B testing can detect changes in user metrics, like satisfaction or churn, prior to the full-scale deployment (Tang et al., 2010; Hohnhold et al., 2015; Xu et al., 2015; Gordon et al., 2019) , assessing the impact on content producers is comparatively harder due to the longer delay between the roll-out and corresponding content adaptation. Furthermore, since producers cannot be easily assigned to distinct treatment groups without limiting their content to only a subset of consumers, modern A/B testing methods must eschew making causal statements about producer impact (Nandy et al., 2021; Ha-Thuc et al., 2020; Huszár et al., 2022) . Undesirable results including promulgation of junk and abusive content then have to be addressed via post-hoc measures like content filtration and moderation. As Theorem 3 predicts, large τ (e.g., more exploration) leads to higher concentration, i.e., creating content which appeals to more users. Left: MovieLens. Right: LastFM. See Section 3.2 for more discussion. A tool for ex-ante (pre-deployment) assessment of producer impact could thus limit the harm experienced by users, moderators, and other affected parties. We demonstrate how to utilize a creator behavior model for this purpose, using the exposure game as a concrete example. The incorporation of factorization-based algorithms in exposure games allows us to use real-world datasets and rating models. While exposure games have limitations as a behavior model, we believe our experiments provide a useful illustration of the insights the proposed audit can offer to platform developers.

3.1. SETUP

We use the MovieLens-100K and LastFM-360K datasets (Harper & Konstan, 2015; Bertin-Mahieux et al., 2011; Shakespeare et al., 2020) , implement our code in Python (van Rossum & Drake, 2009) and rely on numpy (Harris et al., 2020) , scikit-surprise (Hug, 2020) , pandas (pandas development team, 2020), matplotlib (Hunter, 2007) , jupyter (Kluyver et al., 2016) , reclab (Krauth et al., 2020) , and JAX (Bradbury et al., 2018) packages to fit probabilistic (PMF; Mnih & Salakhutdinov, 2007) and non-negative (NMF; Lee & Seung, 1999) matrix factorization. The models are trained to predict the user ratings (centered in the PMF case). To select regularization and learning rate, we performed a two-fold 90/10 split cross-validation separately on each dataset. The tuned recommenders were then fit on the full dataset, and the resulting user embeddings, {c j } j∈[m] ⊂ R d , were used to construct the demand distribution P c = 1 m j δ cj , and evaluate the recommendation probabilities p i (c). Details in Appendix C.1. The only algorithm for finding NE in hardmax exposure games we know has exponential worst-case complexity. We thus focus on the softmax case. While search for general mixed NE is possible in special cases (Fudenberg & Kreps, 1993; Kaniovski & Young, 1995; Benaïm & Hirsch, 1997) , we are not aware of any technique applicable to n-player exposure games. We therefore focus on pure ϵ-LNE (Section 2.3), where each producer creates a single new item. We employ simple gradient ascent (Singh et al., 2000; Balduzzi et al., 2018 , see Appendix C.2 for comparison with gradient descent) combined with reparametrization s i = θ i /∥θ i ∥ for each producer, where we iteratively update θ i,t = θ i,t-1 + α∇ θi,t-1 u i (s i,t-1 , s \i,t-1 ) for shared step size α > 0, and ∇ θi u i (s) = 1 τ ∥θi∥2 (I -s i s ⊤ i ) E[p i (c)(1 -p i (c))c] = 1 ∥θi∥2 (I -s i s ⊤ i ) ∇ si u i (s) . Equation ( 4) shows the update direction is parallel to the Riemannian gradient of u i (s) w.r.t. s i ∈ S d-1 (Section 2.3). We also experimented with the related Riemannian gradient ascent optimizer (Boumal, 2022) , but abandoned it after (predictably) observing little qualitative difference. We note that the local updates themselves define better-response dynamics linked to iterative minor content changes; investigation of their relation to real-world producer behavior is an interesting future direction. We investigate the sensitivity of the incentivized content to the: (i) rating model ∈ {PMF, NMF}, (ii) embedding dimension d ∈ {3, 50}, and (iii) temperature log 10 τ ∈ {-2, -1, 0}. We further vary the number of producers n ∈ {10, 100} to examine scenarios with different producer to consumer ratios (user count is fixed to the full 943 for MovieLens, and 13,698 for LastFM). The above values were selected in a preliminary sweep as representative of the effects presented below. For every setting, we used five random seeds for initialization of the recommender (affects P c ), and for each ran the gradient ascent algorithm 10x to identify possible ϵ-LNE. We applied early stopping when ℓ 2 -change in parameters between iterations dipped below 10 -8 • √ d; the number of iterations was set to 50K so convergence was achieved for every run. We only report runs where the second-order Riemannian test from Section 2.3 did not rule out an ϵ-LNE. Additional results, including those where the Riemannian test was conclusive, are in Appendix C.3. 

3.2. RESULTS

Emergence of clusters with growing τ . Theorem 3 shows that producers concentrate around c = E[c]/∥ E[c]∥ for sufficiently high τ . Figure 4 corroborates the result on both MovieLens and LastFM, with the concentration happening already at τ = 1 regardless of the embedding dimension d and producer count n. We also see that lower τ can lead to "local clustering" where only few producers converge onto the same strategy. We hypothesize that the simultaneous local updates of the consumers create "attractor zones" where close-by producers collapse onto each other; they will remain collapsed henceforth due to equality of their gradients (by symmetry). Theorem 3 does tell us collapse is to be expected for high τ , and it is possible that a local version of the result with more than one clusters is true for intermediate values of τ . This highlights how crucial the algorithmic choice of τ is for the induced incentives within our model. Targeting of incentivized content by gender. The MovieLens dataset contains binarized user gender information. In Figure 5 , we examine targeting of incentivized content on women and men. To do so, we employ aggregate statistics of predicted ratings. While predicted ratings may differ from actual user preferences, they do determine recommendations and thus user experience. To help disentangle effect of exposure maximization, we also include statistics based on the original item locations (labeled by 'b'), i.e., the content created before producers adapt to the recommender. Since the baseline embeddings need not satisfy the unit norm constraint (see Definition 1), we measure normalized ratings ri (c) := ⟨c,si⟩ /∥c∥∥si∥ to facilitate comparison. The normalization also alleviates the known issue of varying interpretation of ranking scales between users (Lynch Jr et al., 1991) . In Figure 5 (left), median c∈men {max i ri (c)}median c∈women {max i ri (c)} measures if the incentivised content is predicted to appeal to women/men; Figure 5 (right) shows the fraction of creators incentivised to target women/men: 1 n n i=1 1{argmax c ri (c) ∈ men}-1{argmax c ri (c) ∈ women}. Positive values signify content crafted for male audience (users are 71% male). Higher embedding dimension results in more bias, presumably due to the larger model expressivity, and thus enables more fine-grained targeting. NMF consistently incentivizes more biased content. Association between incentivized content and creator gender. Platform developers may want to know if some creators are being disadvantaged (Chokshi, 2017; Farokhmanesh, 2018; Rodriguez, 2022) . While solutions were proposed in the static case (e.g, Beutel et al., 2019; Wang et al., 2021) , understanding if the algorithm (de)incentivizes content by particular creator groups may limit future harm. In Figure 6 , we measure the difference between the proportion of (left) and the median distance to (right) baseline creator embeddings (learned by the recommender before strategic adaptation), within increasingly large neighborhoods of each strategic s i . Since the baseline embeddings need not be unit norm, we use the cosine distance to define the neighborhoods. Starting with the proportion (left), higher embedding dimension (more flexible model) incentivizes content more typical of male artists. This may be related to the higher prevalence of men in LastFM, combined with training by average loss minimization. The gender imbalance also explains why the proportion (left) stabilizes at a positive value, whereas the median distance (right) reverts to zero, as the number of considered neighbors grows. The bias is also related to the choice of rating model, where especially PMF at high temperatures results in significant advantage for male artists.  2 0 2 1 2 2 2 3 2 4 2 5 # neighbors 2 0 2 1 2 2 2 3 2 4 2 5 # neighbors 2 0 2 1 2 2 2 3 2 4 2 5 # neighbors Figure 6 : Incentivized content and creator gender on LastFM. Quantifying relative difficulty of strategic adaptation for female and male content creators, Uses baseline creator embeddings (and associated gender), and their cosine distance from strategic embeddings. Left: Difference between fractions of male and female creators in increasingly large neighborhood of each strategic item. Values above zero imply bias towards male producers. Higher embedding dimension (model expressivity) again results in larger bias. The bias also seems to be larger for higher τ and for the PMF rating model. Right: Difference between median cosine distance to female and male creators within increasingly large neighborhood of each strategic item. Values above zero imply bias towards male producers. Higher bias is again associated with higher embedding dimension and the PMF rating model, but the impact of temperature τ is less pronounced. See Section 3.2 for more discussion.

4. DISCUSSION

From social media and streaming to Google Search, many of us interact with recommender and information retrieval systems every day. While the core algorithms have been developed and analyzed years ago, the socio-economic context in which they operate received comparatively little attention in the academic literature. We make two main contributions: (a) we define exposure games, an incentivebased model of content creators' interactions with real-world algorithms including the popular matrix factorization and two-tower systems, and (b) we formulate a a pre-deployment audit which employs a model of creator behavior to identify misalignment between incentivized and desirable content. Our main theoretical contributions focus on the properties of Nash equilibria in exposure games. We found that seemingly innocuous algorithmic choices like temperature τ , embedding dimension d, or a non-negativity constraint on embeddings can have serious impact on the induced incentives. For example, high τ incentivizes uniform broadly appealing content, whereas low τ motivates targeting smaller consumer groups. Since higher τ is often linked to exploration, which is necessary for optimal performance in static settings (e.g., Lattimore & Szepesvári, 2020) , this result highlights the importance of considering the socio-economic context in algorithm development. Our producer model has several limitations from assuming rationality, complete information, and full control, to taking the skill set of each producer to be the same, their utility to be linear in total exposure, and ignoring algorithmic diversification of recommendations. We also consider the attention pool as fixed and finite, neglecting the problematic reality of the modern attention economy, where online platforms constantly struggle to increase their user numbers and daily usage (Covington et al., 2016; Williams, 2018; Bhargava & Velasquez, 2021) . Our theoretical understanding is incomplete as, e.g., our understanding of the influence of constraining embeddings to be non-negative is limited to the two-dimensional case. The empirical evaluation of our behavior model is hindered by the lack of academic access to the almost exclusively privately owned platforms (Greene et al., 2022) . Due to their sizable influence on individuals, societies, and economy (Milano et al., 2020) , information and recommender systems are of critical importance from an ethical and societal perspective. While we hope that a better understanding of the incentives these algorithms create will mitigate their negative social consequences, this also entails risks. Perhaps the most important is the possibility of employing an optimizer such as the one in Section 3 to game a real-world algorithm. This is especially relevant to the current debate about transparency (e.g., Sonboli et al., 2021; Rieder & Hofmann, 2020; Sinha & Swearingen, 2002) , and the proposal to (partially) open-source the Twitter code base (Knight, 2022) . Due to the aforementioned limitations, we also caution against treating the predictions of our incentive-based behavior model as definitive, especially given the significant complexity of many real-world algorithms and the environments in which they operate. Going forward, we want to deepen our understanding of exposure games, and make pre-deployment audits a practical addition to the algorithm auditing toolbox. We hope this research enriches the debate about online platforms by a useful perspective for thinking about harms, (over)amplification, and design of algorithms with regard to the relevant incentives of the involved actors. Proposition 1. A PNE always exists in n = d = 2 non-negative hardmax games, but may not without non-negativity or when d > 2. For n = 2 non-negative softmax games with ĉ := For non-existence when d > 2, consider d = 3 and P c = 1 3 3 j=1 δ cj where c j are the three canonical basis vectors. Assume s = (s 1 , s 2 ) is a PNE. Disregards of s 1 location, there will be a point s 2 on the great circle connecting the two most distant points from s 1 (break ties arbitrarily) which is closer to both of the two. Hence u 2 (s) ≥ 2 /3 by the assumption that s is a PNE. The same argument implies u 1 (s) ≥ 2 /3. This is a contradiction since i u i (s) = 1 by definition. 1 n (1 -1 n ) E[c] ̸ = 0, For non-existence without non-negativity in d = 2, see the hardmax part of the Theorem 2 proof. (II) Softmax: In the n = 2 case, a necessary condition for s = (s 1 , s 2 ) to be a PNE is that the Riemannian gradients of the utility, (I - s i s ⊤ i )g i with g i = ∇ si u i (s), are zero. Since ∇ si u i (s) = τ -1 E[p i (c)(1 -p i (c))c], g i belongs to the first orthant by the definition of a non-negative game, and it is not zero (for τ > 0, all probabilities lie in (0, 1), and c is not a.s. zero since we assumed E[c] ̸ = 0). Hence the Riemannian gradients can only be zero if s i ∝ g i , and in particular s i = g i /∥g i ∥ 2 because this is the direction which makes dot products with all vectors in the first orthant positive. Crucially, g 1 = g 2 in 2-player games due to the symmetry of p 1 (c)(1 -p 1 (c)) = p 1 (c)p 2 (c) = p 2 (c)(1 -p 2 (c)). Therefore at a PNE, s 1 = s 2 in which case p i (c) = 1 2 for all c. Thus g i (s) ∝ E[c], implying s 1 = s 2 = c is the only possible PNE. To show it may not be a PNE, consider P c = 1 3 (2δ c1 + δ c2 ) for arbitrary non-zero c 1 ̸ = c 2 in the first orthant. Then c ∝ 2c 1 + c 2 with u 1 (c, c) = u 2 (c, c) = 1/2. Fixing s 1 = c 1 /∥c 1 ∥ 2 and taking τ ↓ 0, we get u 1 (s 1 , c) → 2/3, which means there exists a τ > 0 for which s 1 = c 1 /∥c 1 ∥ 2 is a strict improvement over s 1 = c when s 2 = c. For the n > 2 case, we focus on a two-dimensional n = 4 game with P c = 1 2 (δ c1 + δ c2 ) with c 1 = [1, 0] ⊤ and and c 2 = [0, 1] ⊤ (the two canonical basis vectors). In particular, we investigate existence of NE of the form s 1 = s 2 and s 3 = s 4 . Since d = 2, the strategies are restricted to S 1 , which means we can use polar coordinates to parameterize s i = φ(θ i ) := [cos(θ i ), sin(θ i )] ⊤ . We will further restrict our attention to the symmetric case θ 1 = θ 2 = θ and θ 3 = θ 4 = π 2θ for some θ ∈ [0, π 4 ] =: K. This allows us to define Q := 0 -1 1 0 , and look for values of θ ∈ K where (w.l.o.g.) f (θ) := ∂u1(s) ∂s1 ∂s1 ∂θ1 | θ1=θ = ⟨g 1 , Qs 1 ⟩ is equal zero. Note that in the definition of f , all s i and g i vary with θ according to the relationship s i = φ(θ i ) with θ 1 = θ 2 = θ and θ 3 = θ 4 = π 2θ. However, f (θ) is only the derivative of u 1 (s) w.r.t. θ 1 , ignoring the dependence of s 2 , s 3 and s 4 on θ. This definition of f means that only the roots of f can possibly be NE. The next lemma will help us locate these roots. Lemma 2. For a sufficiently small τ > 0, f : θ → ⟨g 1 , Qs 1 ⟩ is strictly convex on K. Proof of Lemma 2. It is sufficient to prove that f ′′ > 0 on K. For this, observe f ′ (θ) = ∥Qs 1 ∥ 2 H1 -⟨g 1 , s 1 ⟩ where H 1 := ∇ 2 s1 u 1 (s), and f ′′ (θ) = ∥Qs 1 ∥ 2 ∇ θ 1 H1 -⟨Qs 1 , 3H 1 s 1 + g 1 ⟩ ≥ ∥Qs 1 ∥ 2 ∇ θ 1 H1 -3∥H 1 ∥ 2 -∥g 1 ∥ 2 , where by construction g 1 = 1 2τ p1(c1)(1 -p1(c1)) p1(c2)(1 -p1(c2)) , H 1 = 1 2τ 2 (1 -2p1(c1))p1(c1)(1 -p1(c1)) 0 0 (1 -2p1(c2))p1(c2)(1 -p1(c2)) ∇ θ1 H 1 = 1 2τ 3 (1 -6p1(c1)(1 -p1(c1))p1(c1)(1 -p1(c1)) 0 0 (1 -6p1(c2)(1 -p1(c2))p1(c2)(1 -p1(c2)) , Now assume we have two mixed strategies (P 1 , P 2 ) such that E s1∼P1 [u 1 (s 1 , s)] ≥ 1 2 and E s2∼P2 [u 2 (s, s 2 )] ≥ 1 2 for all s ∈ S d-1 . Given a mixed strategy P ∈ P(S d-1 ) it follows that E (s1,s2)∼(P1,P ) [u 1 (s 1 , s 2 )] = E s2∼P [E s1∼P1 [u 1 (s 1 , s 2 )]] = S d-1 E s1∼P1 [u 1 (s 1 , s 2 )]dP (s 2 ) ≥ S d-1 1 2 dP (s 2 ) = 1 2 . Similarly for Player 2. Lemma 4 allows us to only consider pure strategies when checking if strategies are mixed NE. Now given a mixed strategy P with finite support supp(P ) = {s (1) , s (2) , . . . , s (m) } we can find every subset of S d-1 that does not satisfy the condition in Lemma 4. By noting that any arbitrary strategy s can be either closer, farther, or at the same distance from a consumer as a given s (i) ; we see that each s (i) partitions S d-1 into 3 l disjoint partitions based upon the distance of the strategies to each consumer c k . That is, X (i) = {X (i) 1 , X (i) 2 , . . . , X (i) 3 l }, with X (i) j satisfying j k =    2 if ⟨s (i) , c k ⟩ > ⟨s, c k ⟩ 1 if ⟨s (i) , c k ⟩ = ⟨s, c k ⟩ 0 if ⟨s (i) , c k ⟩ < ⟨s, c k ⟩, for all pure strategies s ∈ X (i) j , where j k is the k-th digit in the ternary representation of j. By considering all m partitions created by the strategies in supp(P ), we can further partition the space into 3 lm disjoint partitions Y = {Y 1 , Y 2 , . . . , Y 3 lm } with Y i = m j=1 X (j) ij where i j is the j-th digit of the 3 l -ary representation of i. For every Y ∈ Y we have E s1∼P [u 1 (s 1 , s)] = E s1∼P [u 1 (s 1 , s ′ )] for all s, s ′ ∈ Y by construction. Thus, we can find the set of all pure strategies D that dominate P by iterating over Y, testing a single point in each partition, and taking unions: Z = Y ∈ Y : s ∈ Y =⇒ E s1∼P [u 1 (s 1 , s)] < 1 2 , D = Y ∈Z Y. It follows from Lemma 4 that (P, P ) is a mixed NE if and only if D is empty. Finally, we outline a method to find mixed NE. We first note that for every positive integer m, every pure strategy s ∈ S d-1 defines a feasible set F s of all mixed strategies with support over at most m pure strategies that are not dominated by s, that is: F s = P = m i=1 π i δ s (i) : m i=1 π i u 1 (s (i) , s) ≥ 1 2 , where π is an m-dimensional probability vector. It follows from Lemma 4 that if P is mixed strategy with support over at most m points then (P, P ) is a mixed NE if and only if P ∈ s∈S d-1 F s . We can frame finding such a strategy P as an optimization problem where P is the set of all mixed strategies with support over at most m pure strategies. An optimal solution with more than one element in P indicates that there does not exist a mixed strategy with support over m points or fewer, whereas if |P| = 1 then (P, P ) is a mixed strategy where P is the singleton element in P. This is an instance of the implicit hitting set problem. Hence, we can use the algorithm proposed in Section 2.1 of Chandrasekaran et al. (2011) to solve the above optimization problem. Their 



Possibly due to the often finite rating scale, use of gradient clipping, and various forms of regularization.



Figure 1: Exposure game. Items s i ∈ S d-1 placed to maximize exposure to consumers c ∼ P c .

Figure3: A) A game with no PNE (see the proof of Theorem 2). A PNE would exist if the strategy space was convex, and utility quasi-concave(Fan, 1952). B) and C) demonstrate lack of quasiconcavity even if we allow ∥s i ∥ ≤ 1: B) n -1 producers at midpoint, s 1 along slice λc 1 + (1λ)c 2 (dashed line); C) Change in utility along the slice in B) demonstrates lack of quasi-concavity. D) A non-negative game with very different PNE depending on τ . E) PNE with "protective positioning."

the only possible PNE is s 1 = s 2 = c with c := ĉ/∥ĉ∥ (independently of d), but a PNE may not exist. When n > 2, non-negative softmax games can have a PNE other than s 1 = • • • = s n = c.

Figure4: Clustering of strategic producers depends on the exploration level τ . A cluster is a set of points whose Euclidean distances from one another are less than 10 -5 √ d. As Theorem 3 predicts, large τ (e.g., more exploration) leads to higher concentration, i.e., creating content which appeals to more users. Left: MovieLens. Right: LastFM. See Section 3.2 for more discussion.

Figure5: Targeting of incentivized content by gender on MovieLens. Left: Difference between median c∈G {max i∈[n] ri (c)} for men and women (group G), with ri (c) the normalized rating (cosine similarity between c and the strategic s i ). Positive values imply bias towards men (higher median). Note the higher bias when d = 50 (more expressive algorithm); especially NMF incentivizes more biased content relative to the pre-adaptation baseline 'b'. Right: Difference in proportions of s i with best (normalized) rating by women/men. Positive values imply bias towards men (more items best-rated by men). Bias again more pronounced at d = 50. See Section 3.2 for more discussion.

the only possible PNE is s 1 = s 2 = c with c := ĉ/∥ĉ∥ (independently of d), but a PNE may not exist. When n > 2, non-negative softmax games can have a PNE other than s 1 = • • • = s n = c. Proof of Proposition 1. (I) Hardmax: For existence when n = d = 2, let θ c be the angle of c from (w.l.o.g.) [1, 0], and let A ⊂ C denote the set of angles such that for every θ m ∈ A, P(θ c ≤ θ m ) ≥ 1 2 and P(θ c ≥ θ m ) ≥ 1 2 , with P implied by the underlying P c . Then any (s 1 , s 2 ) ∈ A × A is a PNE.

P ∩ F s ̸ = ∅, s ∈ S d-1 ,

Figure 8: A counterpart to Figure 4 with runs where LNE test was inconclusive excluded.

Figure 9: A counterpart to Figure 5 with runs where LNE test was inconclusive excluded.

Figure 10: A counterpart to Figure 6 with runs where LNE test was inconclusive excluded.

Figure 11: A counterpart to Figure 4 added MF results.

Figure 13: A counterpart to Figure 6 with added MF results.

A PROOFS

List of abbreviations: . Assume s = (s 1 , s 2 ) is a PNE. W.l.o.g. c 1 = argmax j ⟨s 1 , c j ⟩. Then there is s 2 on the geodesic connecting c 2 and c 3 which has higher dot product with both c 2 and c 3 than s 1 . Hence u 2 (s) ≥ 2 /3 by the assumption that s is a PNE. The same argument implies u 1 (s) ≥ 2 /3. This is a contradiction since i u i (s) = 1 by definition of the exposure utility.(II) Softmax: Let n = d = 2, and P c = 1 3 (2δ e1 + δ e2 ) where e 1 = [1, 0] ⊤ and e 2 = [0, 1] ⊤ . By Proposition 1, we know that the only possible PNE is s 1 = s 2 = c ∝ E[c] = [2, 1]/3, where both Published as a conference paper at ICLR 2023 players enjoy u 1 (s) = u 2 (s) = 1 2 . Let s ′ 1 = (c + ϵe 1 )/∥c + ϵe 1 ∥ for some ϵ > 0. As τ → 0, u 1 (s ′ 1 , c) → 2 3 by continuity. Hence ∃τ 0 > 0 s.t. s 1 = s 2 = c is not a PNE for all τ < τ 0 .Theorem 3. For any ϵ > 0 and P c ∈ P(R d ) with compact support and E[c] ̸ = 0, ∃τ 0 > 0 s.t. s 1 = . . . = s n = c is an ϵ-PNE for all τ ≥ τ 0 . Moreover, for all τ ≥ τ 0 , the smallest ϵ τ for which c is an ϵ τ -PNE satisfies ϵ τ ≤ ϵ τ . If also ϵ < ∥ĉ∥, then the set of better-responses to cis a subset of B d δ (c) = {v : ∥v -c∥ ≤ δ} with δ = 2ϵ/(∥ĉ∥ϵ), and δ → 0 as τ → ∞.Proof of Theorem 3. We w.l.o.g. focus on the defection strategies for s 1 . By the mean-value theoremfor some s ′ 1 on the line connecting s 1 and c. While the rigorous argument below relies on a few technicalities, the main idea is simple:Proof of Lemma 1. Since supp(P c ) is compact by assumption, and τuniformly over B d by continuity of the exponential function at zero.For any given ϵ > 0, Lemma 1 can be combined withwhere ⟨ĉ, s 1 -c⟩ ≤ 0 for all s 1 ∈ S d-1 by c = ĉ/∥ĉ∥, to obtain ∆ < ε for a sufficiently large τ . In particular, Lemma 1 yields a τ 0 such that ∥τ 0Hence c is at least an ϵ τ -PNE for all τ ≥ τ 0 (w.l.o.g. τ 0 ≥ 1). The above can be used to obtain a bound on δ := ∥s 1 -c∥ for s 1 ∈ Ψ(c). Using orthogonalityby the triangle inequality, and ⟨c,. The terms in the square bracket on the r.h.s. can be bounded using the Pythagoras' theoremwhere we used (Icc ⊤ )ĉ = 0 and ∥c∥ = 1. Because ∥τ • g ′ 1 -ĉ∥ < ϵ, the same is true for (the square roots of) both terms on the r.h.s. above. By a simple algebraic manipulation of these inequalitiesThe r.h.s. is positive only when 0 < δ < 2ϵ/(∥ĉ∥ϵ). Since ϵ in Equation ( 5) is only used as an upper bound on ∥τ • g ′ 1 -ĉ∥, and Lemma 1 tells us this norm converges to zero, δ → 0 as τ → ∞. . Right: Plot of utility and its gradient for all possible defection strategies s 1 = φ(θ 1 ) with s 2 , s 3 , s 4 kept put in the positions defined by θ ⋆ τ from the left plots. Vertical line shows π 4 (right end of K).Hence, and ∥g 1 ∥ 2 ∼ τ -1 , implying that for τ low enough, the positive term ∥Qs 1 ∥ 2 ∇ θ 1 H1 dominates (using that all expressions share the term p 1 (c)(1p 1 (c)), and thus after dividing and observing τ → 0 gives p 1 (c) close to either one or zero, we get that all the terms scale as p 1 (c)(1p 1 (c))/τ k for the appropriate k ∈ {1, 2, 3}).Lemma 2 implies there are at most two NE (f is strictly convex, andThe other possible root of f thus could only be in the interior (0, π 4 ) of K. For small enough τ , moving from θ = π 8 towards e 1 = [1, 0] ⊤ will increase utility, implying f ( π 8 ) < 0. Hence there exists τ > 0 and θ ⋆ τ ∈ (0, π 8 ) s.t. f (θ ⋆ τ ) = 0 by the mean value theorem. So far we have established that) is a local NE for the corresponding small τ . By symmetry, it is sufficient to check if there is a defection strategy for s 1 . Any defection to θ 1 ∈ (θ ⋆ τ , π 2θ ⋆ τ ] will result in p 1 (c) < 1 4 for both c = c 1 , c 2 , and thus worse utility. Defection to ( π 2θ ⋆ τ , π 2 ] will not yield utility greater than defection to [0, θ ⋆ τ ) sinceHeresince p 1 (c 1 ) grows quicker than p 1 (c 2 ) decays. By construction, θ ⋆ τ < π 4 , and we know cos(θ)sin(θ) > 0 for θ ∈ [0, π 4 ). In other words, the utility of s 1 is strictly increasing on Proof of Lemma 3. By symmetry, u i (P c , P c ) = 1 2 , ∀i. Since for anyall we need is to show that E c,s2 [u 1 (s 1 , s 2 ) | s 1 ] ≤ 1 2 for any s 1 / ∈ supp(P c ). W.l.o.g. assume s 1 lies on the geodesic connecting c 1 and c 3 (i.e., on the arc opposite of c 2 ). Such an s 1 is closer to c 1 and c 3 than c 2 (u 1 (s 1 , c 2 ) = 23 ), but is further from c 1 and c 2 (resp. c 3 and c 2 ) than c 3 (resp. c 1 ). HenceSince 4 9 < 1 2 , s 1 has no incentive to move any of its mass away from supp(P c ).

B HARDMAX GAMES

In this section we present two different algorithms for finding mixed Nash equilibria in two-player hardmax games. We note that the set of allowable mixed strategies must be restricted in some way since certain distributions with support on the unit-sphere S d-1 require infinite storage. Hence, our first algorithm finds a mixed NE for a discretized strategy space, while our second algorithm considers settings where P c is discrete and finds a mixed NE with support over a finite number of pure strategies in the original non-discretized space, assuming such a mixed NE exists.We caution that both of these algorithms can only find mixed NE for small exposure games due to their poor scaling properties. We list them here to highlight the difficulty of solving hardmax games when compared to the softmax setting and to serve as inspiration for future research into more efficient algorithms.

B.1 DISCRETIZED GAMES

We first consider the setting where both players may only choose mixed strategies with support over a finite subset A = {s (1) , s (2) , . . . , s (m) } ⊂ S d-1 of pure strategies. This setting includes embeddings that are represented using floating point numbers although A will be very large. In this case the mixed strategy of the players can be expressed as an m-dimensional probability vector π i with π ij = P i s (j) . Since there are a finite set of pure strategies a mixed NE is guaranteed to exist (Nash Jr, 1950) . Furthermore since this is a two-player constant-sum game we can find a mixed NE by solving the following linear program (Dorfman, 1951) maximizewhere U ij = u 1 s (i) , s ) . The strategies where π 1 = π 2 = x correspond to a mixed NE. While such a problem is simple to formulate and solve, the number of possible strategies grows rapidly with d for most discretization schemes. For example, we might create a uniform grid of k points over each spherical coordinate, in which case we will have m = k d-1 pure strategies to consider.

B.2 FINITE SUPPORT

Next, we consider the setting where both players choose mixed strategies with support over at most m pure strategies, and the support of P is over l points, supp(P c ) = {c 1 , c 2 , . . . , c l }. Unlike in the discretized case, the players may choose any pure strategy that lies on S d-1 . We begin by outlining a method that, given a mixed strategy P , finds all pure strategies D that dominate it:By symmetry, we w.l.o.g. assume Player 1 provides the mixed strategy. We will then use this method as a subroutine to find a mixed NE.for all pure strategies s ∈ S d-1 .Proof of Lemma 4. Assume (P 1 , P 2 ) is a mixed NE, then by definition E (s1,s2)∼(P1,P ) [u 1 (s 1 , s 2 )] ≥ 1 2 for all mixed strategies P ∈ P(S d-1 ), since each pure strategy is also a mixed strategy it follows that E s1∼(P1,s) [u 1 (s 1 , s)] ≥ 1 2 for all s ∈ S d-1 . Similarly for Player 2.algorithm assumes an oracle that, given a proposed subset P ⊆ P will return a subset F s that is not hit P ∩ F s = ∅ or will certify P as a feasible solution to the above optimization problem. We can easily achieve this by finding all dominating pure NE using our proposed method above for each P ∈ P and taking the intersection of the resulting sets. If the intersection is empty then P is a feasible solution, otherwise every element in the intersection represents a subset F s that has not been hit by P.

C EXPERIMENTS

C.1 SETUP The LastFM dataset was preprocessed by Shakespeare et al. (2020) . Original larger scale sweep was executed with n ∈ {10, 25, 100, 500, 1500}, d ∈ {3, 10, 50, 100}, stepsize in {10 -3 , 10 -2 , 10 -1 }, and τ ∈ {10 -2 , 10 -1 , 0.25, 0.5, 1.0}. We only used 2 random seeds for the recommender, and 3 random seeds for our LNE-finding algorithm (i.e., 6 runs in total per configuration). For the reported results, stepsize sweep was restricted to {10 -2 , 10 -1 }; the number of steps was upper bounded by 50,000 (all runs have successfully converged to a fixed point as mentioned). While our code contains an option to scale_lr_by_temperature (see the config.py file in the provided code), which multiplies the stepsize by τ before its use, we did not use this option in the experiments.The second-order Riemannian test (Definition 3) is implemented in manifold.py. Defining the tangent space projection Π i := (Is i s ⊤ i ), we consider a candidate strategy profile s ∈ (S d-1 ) n as violating the second order test if any of the Riemannian gradients Π i ∇ si u i (s) had ℓ 2 -norm higher than 10 -5 • √ d, or the Riemannian Hessian Π i [∇ 2 si u i (s)]Π i -⟨s i , ∇ si u i (s)⟩Π i had a strictly positive eigenvalue (no tolerance used here).The final MovieLens and LastFM experiments were run on 72 AWS machines, each with 4 CPU cores, for 5 hours. Including preliminary and failed runs, we used over 50K CPU hours.

C.2 OPTIMIZER

The gradient ascent optimization technique (Singh et al., 2000; Balduzzi et al., 2018) we employ is very similar to standard gradient descent algorithm from machine learning literature. Here we provide a short description of the similarities and differences between the two.The optimizer we use simultaneously runs n independent gradient descent optimizers, each following the gradient of the utility u i (s) w.r.t. θ i , i ∈ [n], as described around Equation (4) (recall s i = θi /∥θi∥). θ i,t+1 is obtained using θ j,t for all j ̸ = i, i.e., the locations of the other producers from the last step. All n optimizers execute these steps at the same time, iterating until all of them converge. See optimisation.py, particularly the optax_minimisation method, for more details.C.3 ADDITIONAL PLOTS Appendix C.3.1 contains plots where the second-order test confirmed and LNE. Appendix C.3.2 then offers comparison to a third ranking algorithm: standard matrix factorization (MF; Koren et al., 2009) , i.e., PMF with additional bias terms. The bias terms effect interpretation of τ values, and we also ignore them when running the LNE-finding algorithm. This makes the comparison with PMF and NMF difficult, which is why we excluded MF from the main text. Results in Appendix C.3.2 again contain runs where the second-order test did not rule out a LNE.

C.3.1 LNE CONFIRMED BY THE SECOND-ORDER TEST

As mentioned, the plots shown in the main body of the paper are for runs where the second-order Riemannian test did not rule out that the found pure strategy profile is a LNE. Here we show exactly the same plots with only the runs where the test confirmed a LNE. The difference is that here we exclude the runs where the Riemannian Hessian had at least one zero eigenvalue associated with a direction perpendicular to s i , for at least one i ∈ [n]. As you see below, this had little effect on the LastFM results, but has non-negligibly reduced the number of admitted runs for MovieLens.

