AUCTION LEARNING AS A TWO-PLAYER GAME

Abstract

Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design. While theoretical approaches to the problem have hit some limits, a recent research direction initiated by Duetting et al. (2019) consists in building neural network architectures to find optimal auctions. We propose two conceptual deviations from their approach which result in enhanced performance. First, we use recent results in theoretical auction design to introduce a time-independent Lagrangian. This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a single metric to compare the performance of two auctions (absent from prior work). Second,the optimization procedure in previous work uses an inner maximization loop to compute optimal misreports. We amortize this process through the introduction of an additional neural network. We demonstrate the effectiveness of our approach by learning competitive or strictly improved auctions compared to prior work. Both results together further imply a novel formulation of Auction Design as a two-player game with stationary utility functions.

1. INTRODUCTION

Efficiently designing truthful auctions is a core problem in Mathematical Economics. Concrete examples include the sponsored search auctions conducted by companies as Google or auctions run on platforms as eBay. Following seminal work of Vickrey (Vickrey, 1961) and Myerson (Myerson, 1981) , auctions are typically studied in the independent private valuations model: each bidder has a valuation function over items, and their payoff depends only on the items they receive. Moreover, the auctioneer knows aggregate information about the population that each bidder comes from, modeled as a distribution over valuation functions, but does not know precisely each bidder's valuation (outside of any information in this Bayesian prior). A major difficulty in designing auctions is that valuations are private and bidders need to be incentivized to report their valuations truthfully. The goal of the auctioneer is to design an incentive compatible auction which maximizes expected revenue. Auction Design has existed as a rigorous mathematical field for several decades and yet, complete characterizations of the optimal auction only exist for a few settings. While Myerson's Nobel prizewinning work provides a clean characterization of the single-item optimum (Myerson, 1981) , optimal multi-item auctions provably suffer from numerous formal measures of intractability (including computational intractability, high description complexity, non-monotonicity, and others) (Daskalakis et al., 2014; Chen et al., 2014; 2015; 2018; Hart & Reny, 2015; Thanassoulis, 2004 ). An orthogonal line of work instead develops deep learning architectures to find the optimal auction. Duetting et al. (2019) initiated this direction by proposing RegretNet, a feed-forward architecture. They frame the auction design problem as a constrained learning problem and lift the constraints into the objective via the augmented Lagrangian method. Training RegretNet involves optimizing this Lagrangian-penalized objective, while simultaneously updating network parameters and the Lagrangian multipliers themselves. This architecture produces impressive results: recovering nearoptimal auctions in several known multi-item settings, and discovering new mechanisms when a theoretical optimum is unknown. Yet, this approach presents several limitations. On the conceptual front, our main insight is a connection to an exciting line of recent works (Hartline & Lucier, 2010; Hartline et al., 2011; Bei & Huang, 2011; Daskalakis & Weinberg, 2012; Rubinstein & Weinberg, 2018; Dughmi et al., 2017; Cai et al., 2019) on ε-truthful-to-truthful reductions. 1 On the technical front, we identify three areas for improvement. First, their architecture is difficult to train in practice as the objective is nonstationary. Specifically, the Lagrangian multipliers are time-dependent and they increase following a pre-defined schedule, which requires careful hyperparameter tuning (see §3.1 for experiments illustrating this). Leveraging the aforementioned works in Auction Theory, we propose a stationary Lagrangian objective. Second, all prior work inevitably finds auctions which are not precisely incentive compatible, and does not provide a metric to compare, say, an auction with revenue 1.01 which is 0.002-truthful, or one with revenue 1 which is 0.001-truthful. We argue that our stationary Lagrangian objective serves as a good metric (and that the second auction of our short example is "better" for our metric). Finally, their training procedure requires an inner-loop optimization (essentially, this inner loop is the bidders trying to maximize utility in the current auction), which is itself computationally expensive. We use amortized optimization to make this process more efficient.

CONTRIBUTIONS

This paper leverages recent work in Auction Theory to formulate the learning of revenue-optimal auctions as a two-player game. We develop a new algorithm ALGnet (Auction Learning Game network) that produces competitive or better results compared to Duetting et al. (2019) 's RegretNet. In addition to the conceptual contributions, our approach yields the following improvements (as RegretNet is already learning near-optimal auctions, our improvement over RegretNet is not due to significantly higher optimal revenues). -Easier hyper-parameter tuning: By constructing a time-independent loss function, we circumvent the need to search for an adequate parameter scheduling. Our formulation also involves less hyperparameters, which makes it more robust. -A metric to compare auctions: We propose a metric to compare the quality of two auctions which are not incentive compatible. -More efficient training: We replace the inner-loop optimization of prior work with a neural network, which makes training more efficient. -Online auctions: Since the learning formulation is time-invariant, ALGnet is able to quickly adapt in auctions where the bidders' valuation distributions varies over time. Such setting appears for instance in the online posted pricing problem studied in Bubeck et al. (2017) . Furthermore, these technical contributions together now imply a novel formulation of auction learning as a two-player game (not zero-sum) between an auctioneer and a misreporter. The auctioneer is trying to design an incentive compatible auction that maximizes revenue while the misreporter is trying to identify breaches in the truthfulness of these auctions. The paper decomposes as follows. Section 2 introduces the standard notions of auction design. Section 3 presents our game formulation for auction learning. Section 4 provides a description of ALGnet and its training procedure. Finally, Section 5 presents numerical evidence for the effectiveness of our approach.

RELATED WORK

Auction design and machine learning. Machine learning and computational learning theory have been used in several ways to design auctions from samples of bidder valuations. Machine learning has been used to analyze the sample complexity of designing optimal revenue-maximizing auctions. This includes the framework of single-parameter settings (Morgenstern & Roughgarden, 2015; Huang et al., 2018; Hartline & Taggart, 2019; Roughgarden & Schrijvers, 2016; Gonczarowski & Nisan, 2017; Guo et al., 2019) , multi-item auctions (Dughmi et al., 2014; Gonczarowski & Weinberg, 2018) , combinatorial auctions (Balcan et al., 2016; Morgenstern & Roughgarden, 2016; Syrgkanis, 2017) and allocation mechanisms (Narasimhan & Parkes, 2016) . Other works have leveraged machine learning to optimize different aspects of mechanisms (Lahaie, 2011; Dütting et al., 2015) . Our approach is different as we build a deep learning architecture for auction design. Auction design and deep learning. While Duetting et al. (2019) is the first paper to design auctions through deep learning, several other paper followed-up this work. Feng et al. (2018) extended it to budget constrained bidders, Golowich et al. (2018) to the facility location problem. 

2. AUCTION DESIGN AS A TIME-VARYING LEARNING PROBLEM

We first review the framework of auction design and the problem of finding truthful mechanisms. We then recall the learning problem proposed by Duetting et al. (2019) to find optimal auctions.

2.1. AUCTION DESIGN AND LINEAR PROGRAM

Auction design. We consider an auction with n bidders and m items. We will denote by N = {1, . . . , n} and M = {1, . . . , m} the set of bidders and items. Each bidder i values item j at a valuation denoted v ij . We will focus on additive auctions. These are auctions where the value of a set S of items is equal to the sum of the values of the elements in that set at j∈S v ij . Additive auctions are perhaps the most well-studied setting in multi-item auction design (Hart & Nisan, 2012; Li & Yao, 2013; Daskalakis et al., 2014; Cai et al., 2016; Daskalakis et al., 2017) . The auctioneer does not know the exact valuation profile V = (v ij ) i∈N,j∈M of the bidders in advance but he does know the distribution from which they are drawn: the valuation vector of bidder i, v i = (v i1 , . . . , v im ) is drawn from a distribution D i over R m . We will further assume that all bidders are independent and that D 1 = • • • = D n . As a result V is drawn from D := ⊗ n i=1 D i = D ⊗ n 1 . Definition 1. An auction is defined by a randomized allocation rule g = (g 1 , . . . , g n ) and a payment rule p = (p 1 , . . . , p n ) where g i : R n×m → [0, 1] m and p i : R n×m → R 0 . Additionally for all items j and valuation profiles V , the g i must satisfy i [g i (V )] j 1. Given a bid matrix B = (b ij ) i∈N,j∈M , [g i (B) ] j is the probability that bidder i receives object j and p i (B) is the price bidder i has to pay to the auction. The condition i [g i (V )] j 1 allows the possibility for an item to be not allocated. Definition 2. The utility of bidder i is defined by u i ( v i , B) = m j=1 [g i (B)] j v ij -p i (B). Bidders seek to maximize their utility and may report bids that are different from their true valuations. In the following, we will denote by B -i the (n -1) × m bid matrix without bidder i, and by ( b i , B -i ) the n × m bid matrix that inserts b i into row i of B -i (for example: B := ( b i , B -i ). We aim at auctions that incentivize bidders to bid their true valuations. Definition 3. An auction (g, p) is dominant strategy incentive compatible (DSIC) if each bidder's utility is maximized by reporting truthfully no matter what the other bidders report. For every bidder i, valuation v i ∈ D i , bid b i ∈ D i and bids B -i ∈ D -i , u i ( v i , ( v i , B -i )) u i ( v i , ( b i , B -i )). Definition 4. An auction is individually rational (IR) if for all i ∈ N, v i ∈ D i and B -i ∈ D -i , u i ( v i , ( v i , B -i )) 0. (IR) In a DSIC auction, the bidders have the incentive to truthfully report their valuations and therefore, the revenue on valuation profile V is n i=1 p i (V ). Optimal auction design aims at finding a DSIC and IR auction that maximizes the expected revenue rev := E V ∼D [ n i=1 p i (V )]. Since there is no known characterization of DSIC mechanisms in the multi-item setting, we resort to the relaxed notion of ex-post regret. It measures the extent to which an auction violates DSIC. Definition 5. The ex-post regret for a bidder i is the maximum increase in his utility when considering all his possible bids and fixing the bids of others. For a valuation profile V , it is given by r i (V ) = max bi ∈R m u i ( v i , ( b i , V -i )) -u i ( v i , ( v i , V -i )). In particular, DSIC is equivalent to r i (V ) = 0, ∀i ∈ N, ∀ V ∈ D. (IC) The bid b i that achieves r i (V ) is called the optimal misreport of bidder i for valuation profile V . Therefore, finding an optimal auction is equivalent to the following linear program: min (g,p)∈M -E V ∼D n i=1 p i (V ) s.t. r i (V ) = 0, ∀ i ∈ N, ∀ V ∈ D, u i ( v i , ( v i , B -i )) 0, ∀i ∈ N, v i ∈ D i , B -i ∈ D -i . (LP)

2.2. AUCTION DESIGN AS A LEARNING PROBLEM

As the space of auctions M may be large, we will set a parametric model. In what follows, we consider the class of auctions (g w , p w ) encoded by a neural network of parameter w ∈ R d . The corresponding utility and regret function will be denoted by u w i and r w i . Following Duetting et al. (2019) , the formulation (LP) is relaxed: the IC constraint for all V ∈ D is replaced by the expected constraint E V ∼D [r w i (V )] = 0 for all i ∈ N. The justification for this relaxation can be found in Duetting et al. (2019) . By replacing expectations with empirical averages, the learning problem becomes: min w∈R d - 1 L L =1 n i=1 p w i (V ( ) ) s.t. r w i := 1 L L =1 r w i (V ( ) ) = 0, ∀i ∈ N. ( LP) The learning problem ( LP) does not ensure (IR). However, this constraint is usually built into the parametrization (architecture) of the model: by design, the only auction mechanism considered satisfy (IR). Implementation details can be found in Duetting et al. (2019) ; Rahme et al. (2021) or in Sec 4.

3. AUCTION LEARNING AS A TWO-PLAYER GAME

We first present the optimization and the training procedures for ( LP) proposed by Duetting et al. (2019) . We then demonstrate with numerical evidence that this approach presents two limitations: hyperparameter sensitivity and lack of interpretability. Using the concept of ε-truthful to truthful reductions, we construct a new loss function that circumvents these two aspects. Lastly, we resort to amortized optimization and reframe the auction learning problem as a two-player game.

3.1. THE AUGMENTED LAGRANGIAN METHOD AND ITS SHORTCOMINGS

Optimization and training. We briefly review the training procedure proposed by Duetting et al. (2019) to learn optimal auctions. The authors apply the augmented Lagrangian method to solve the constrained problem ( LP) and consider the loss: L(w; λ; ρ) = - 1 L L =1 i∈N p w i (V ( ) ) + i∈N λ i r w i (V ( ) ) + ρ 2 i∈N r w i (V ( ) ) 2 , where λ ∈ R n is a vector of Lagrange multipliers and ρ > 0 is a parameter controlling the weight of the quadratic penalty. More details about the training procedure can be found in Appendix A. Scheduling consistency problem. The parameters λ and ρ are time-varying. Indeed, their value changes according to a pre-defined scheduling of the following form: 1) Initialize λ and ρ with respectively λ 0 and ρ 0 , 2) Update ρ every T ρ iterations : ρ t+1 ← ρ t + c, where c is a pre-defined constant, 3) Update λ every T λ iterations according to λ t i ← λ t i + ρ t r w t i . Therefore, this scheduling requires to set up five hyper parameters (λ 0 , ρ 0 , c, T λ , T ρ ). Some of the experiments found Duetting et al. (2019) were about learning an optimal mechanism for an n-bidder m-item auction (n × m) where the valuations are iid U[0, 1]. Different scheduling parameters were used for different values of n and m. We report the values of the hyper parameters used for the 1 × 2, 3 × 10 and 5 × 10 settings in Table 1(a). A natural question is whether the choice of parameters heavily affects the performance. We proceed to a numerical investigation of this questions by trying different schedulings (columns) for different settings (rows) and report our the results in Table 1 The auction returned by the network dramatically varies with the choice of scheduling parameters. (b). When applying the parameters of 1 × 2 to 5 × 10, we obtain a revenue that is lower by 30%! The performance of the learning algorithm strongly depends on the specific values of the hyperparameters. Finding an adequate scheduling requires an extensive and time consuming hyperparameter search. Lack of interpretability. How should one compare two mechanisms with different expected revenue and regret? Is a mechanism M 1 with revenue P 1 = 1.01 and an average total regret R 1 = 0.02 better than a mechanism M 2 with P 2 = 1.0 and R 2 = 0.01 ? The approach in Duetting et al. (2019) cannot answer this question. To see that, notice that when λ 1 = • • • = λ n = λ we can rewrite L(w; λ; ρ) = -P + λR + ρ 2 R 2 . Which mechanism is better depends on the values of λ and ρ. For example if ρ = 1 and λ = 0.1 we find that M 1 is better, but if ρ = 1 and λ = 10 then M 2 is better. Since the values of λ and ρ change with time, the Lagrangian approach in Duetting et al. (2019) cannot provide metric to compare two mechanisms.

3.2. A TIME-INDEPENDENT AND INTERPRETABLE LOSS FUNCTION FOR AUCTION LEARNING

Our first contribution consists in introducing a new loss function for auction learning that addresses the two first limitations of Duetting et al. (2019) mentioned in Section 3.1. We first motivate this loss in the one bidder case and then extend it to auctions with many bidders.

3.2.1. MECHANISMS WITH ONE BIDDER

Proposition 1. [Balcan et al. (2005) , attributed to Nisan] Let M be an additive auction with 1 bidder and m items. Let P and R denote the expected revenue and regret, P = E V ∈D [p(V )] and R = E V ∈D [r(V )]. There exists a mechanism M * with expected revenue P * = ( √ P -√ R) 2 and zero regret R * = 0. A proof of this proposition can be found in Appendix C. Comparing two mechanisms is straightforward when both of them have zero-regret: the best one achieves the highest revenue. Prop. 1 allows a natural and simple extension of this criteria for non zero-regret mechanism with one bidder: we will say that M 1 is better than M 2 if and only if M * 1 is better than M * 2 : M 1 M 2 ⇐⇒ P * (M 1 ) P * (M 2 ) ⇐⇒ P 1 -R 1 P 2 -R 2 Using our metric, we find that a one bidder mechanism with revenue of 1.00 and regret of 0.01 is "better" than one with revenue 1.01 and regret 0.02.

3.2.2. MECHANISMS WITH MULTIPLE BIDDERS

Let M 1 and M 2 be two mechanisms with n bidders and m objects. Let P i and R i denote their total expected revenue and regret, P i = E V ∈D n j=1 p j (V ) and R i = E V ∈D n j=1 r j (V ) . We can extend our metric derived in Section 3.2.1 to the multiple bidder by the following: M 1 is "better" than M 2 ⇐⇒ M 1 M 2 ⇐⇒ P 1 -R 1 P 2 -R 2 When n = 1 we recover the criteria from Section 3.2.1 that is backed by Prop. 1. When n > 1, it is considered a major open problem whether the extension of Prop. 1 still holds. Note that a multibidder variant of Prop. 1 does hold under a different solution concept termed "Bayesian Incentive Compatible" (Rubinstein & Weinberg, 2018; Cai et al., 2019) , supporting the conjecture that Prop. 1 indeed extends.foot_1 Independently of whether or not Prop. 1 holds, this reasoning implies a candidate loss function for the multi-bidder setting which we can evaluate empirically. This way of comparing mechanisms motivates the use of loss function: L(P, R) = -( √ P -√ R) instead of the Lagrangian from Section 3, and indeed this loss function works well in practice. We empirically find the loss function L m (P, R) = -( √ P -√ R) + R further accelerates training, as it further (slightly) biases towards mechanisms with low regret. Both of these loss function are time-independent and hyperparameter-free.

3.3. AMORTIZED MISREPORT OPTIMIZATION

To compute the regret r w i (V ) one has to solve the optimization problem: max vi ∈R m u w i ( v i , ( v i , V -i )) -u w i ( v i , ( v i , V -i )). In Duetting et al. (2019) , this optimization problem is solved with an inner optimization loop for each valuation profile. In other words, computing the regret of each valuation profile is solved separately and independently, from scratch. If two valuation profiles are very close to each other, one should expect that the resulting optimization problems to have close results. We leverage this to improve training efficiency. We propose to amortize this inner loop optimization. Instead of solving all these optimization problems independently, we will instead learn one neural network M ϕ that tries to predict the solution of all of them. M ϕ takes as entry a valuation profile and maps it to the optimal misreport: M ϕ : R n×m → R n×m V = [ v i ] i∈N → argmax v ∈D u i ( v i , ( v , V -i )) i∈N The loss L r that M ϕ is trying to minimize follows naturally from that definition and is then given by: L r (ϕ, w) = -E V ∈D [ n i=1 u w i ( v i , ([M ϕ (V )] i , V -i ))] .

3.4. AUCTION LEARNING AS A TWO-PLAYER GAME

In this section, we combine the ideas from Sections 3.2 and 3.3 to obtain a new formulation for the auction learning problem as a two-player game between an Auctioneer with parameter w and a Misreporter with parameter ϕ. The optimal parameters for the auction learning problem (w * , ϕ * ) are a Nash Equilibrium for this game. The Auctioneer is trying to design a truthful (IC) and rational (IR) auction that maximizes revenue. The Misreporter is trying to maximize the bidders' utility, for the current auction selected by Auctioneer, w. This is achieved by minimizing the loss function L r (ϕ, w) wrt to ϕ (as discussed in Sec 3.3). The Auctioneer in turn maximizes expected revenue, for the current misreports as chosen by Misreporter. This is achieved by minimizing L m (w, ϕ) = -( √ P w + √ R w,ϕ )+R w,ϕ with respect to w (as discussed in Sec 3.2). Here, R w,ϕ is an estimate of the total regret that auctioneer computes for the current Misreporter ϕ, R w,ϕ = 1 L L =1 i∈N (u w i ( v i , ([M ϕ (V )] i , V -i )) -u w i ( v i , ( v i , V -i )) ) . This game formulation can be summarized as follows:            Misreporter: loss: L r (ϕ, w) parameter: ϕ Auctioneer: loss: L m (w, ϕ) parameter: w (G) Remark 1. The game formulation (G) reminds us of Generative Adversarial Networks (Goodfellow et al., 2014) . Contrary to GANs, it is not a zero-sum game.

4. ARCHITECTURE AND TRAINING PROCEDURE

We describe ALGnet, a feed-forward architecture solving for the game formulation (G) and then provide a training procedure. ALGnet consists in two modules that are the auctioneer's module and the misreporter's module. These components take as input a bid matrix B = (b i,j ) ∈ R n×m and are trained jointly. Their outputs are used to compute the regret and revenue of the auction. 

4.1. THE AUCTIONEER'S MODULE

It is composed of an allocation network that encodes a randomized allocation g w : R nm → [0, 1] nm and a payment network that encodes a payment rule p w : R nm → R n . Allocation network. It computes the allocation probabily of item j to bidder i [g w (B)] ij as [g w (B)] ij = [f 1 (B)] j • [f 2 (B) ] ij where f 1 : R n×m → [0, 1] m and f 2 : R n×m → [0, 1] m×n are functions computed by two feed-forward neural networks. -[f 1 (B)] j is the probability that object j ∈ M is allocated and is given by [f 1 (B)] j = sig (MLP(nm, n a , h a , n)). -[f 2 (B)] ij is the probability that item j ∈ M is allocated to bidder i ∈ N conditioned on object j being allocated. A first MLP computes l j := MLP(nm, n a , h a , m)(B (j) ) for all j ∈ M . The network then concatenates all these vectors l j into a matrix L ∈ R n×m . A softmax activation function is finally applied to L to ensure feasibility i.e. for all j ∈ M, i∈N L ij = 1. Payment network. It computes the payment [p w (B)] i for bidder i as [p w (B)] i = pi m j=1 B ij [g w (B)] ij , where p : R n×m → [0, 1] n . pi is the fraction of bidder's i utility that she has to pay to the mechanism. We compute pi = sig (MLP(nm, n p , h p , 1)) (B (i) ). Finally, notice that by construction [p w (B)] i m j=1 B ij g w (B) ij which ensures that (IR) is respected.

4.2. THE MISREPORTER'S MODULE

The module consists in an MLP(nm, n M , h M , m) followed by a projection layer Proj that ensure that the output of the network is in the domain D of the valuation. For example when the valuations are restricted to [0, 1], we can take Proj = sig, if they are non negative number,we can take Proj = SoftPlus. The optimal misreport for bidder i is then given by Proj • MLP(nm, n M , h M , m)(B (i) ) ∈ R m . Stacking these vectors gives us the misreport matrix M ϕ (B).

4.3. TRAINING PROCEDURE AND OPTIMIZATION

We optimize the game (G) over the space of neural networks parameters (w, ϕ). The algorithm is easy to implement (Alg. 1). At each time t, we sample a batch of valuation profiles of size B. The algorithm performs τ updates for the Misreporter's network (line 9) and one update on the Auctioneer's network (line 10). Moreover, we often reinitialize the Misreporter's network every T init steps in the early phases of the training (t T limit ). This step is not necessary but we found empirically that it speeds up training. Algorithm 1 ALGnet training 1: Input: number of agents, number of objects. 2: Parameter: γ > 0; B, T, T init , T limit , τ ∈ N. 3: Initialize misreport's and auctioneer's nets. 4: for t = 1, . . . , T do 5: if t ≡ 0 mod T init and t < T Limit then:  ϕ s+1 ← ϕ s -γ∇ ϕ L r (ϕ s , w t )(S). 10: w t+1 ← w t -γ∇ w L m (w t , ϕ)(S).

5. EXPERIMENTAL RESULTS

We show that ALGnet can recover near-optimal auctions for settings where the optimal solution is known and that it can find new auctions for settings where analytical solution are not known. Since RegretNet is already capable of discovering near optimal auctions, one cannot expect ALGnet to achieve significantly higher optimal revenue than RegretNet. The results obtained are competitive or better than the ones obtained in Duetting et al. (2019) while requiring much less hyperparameters (Section 3). We also evaluate ALGnet in online auctions and compare it to RegretNet.

Published as a conference paper at ICLR 2021

For each experiment, we compute the total revenue rev := E V ∼D [ i∈N p w i (V )] and average regret rgt := 1 /n E V ∼D [ i∈N r w i (V )] on a test set of 10, 000 valuation profiles. We run each experiment 5 times with different random seeds and report the average and standard deviation of these runs. In our comparisons we make sure that ALGnet and RegretNet have similar sizes for fairness (Appendix D). 5.1 AUCTIONS WITH KNOWN AND UNKNOWN OPTIMA Known settings. We show that ALGnet is capable of recovering near optimal auction in different well-studied auctions that have an analytical solution. These are one bidder and two items auctions where the valuations of the two items v 1 and v 2 are independent. We consider the following settings. (A): v 1 and v 2 are i.i.d. from U[0, 1], (B): v 1 ∼ U[4, 16] and v 2 ∼ U[4, 7], (C): v 1 has density f 1 (x) = 5/(1 + x) 6 and v 2 has density f 2 (y) = 6/(1 + y) 7 . (A) is the celebrated Manelli-Vincent auction (Manelli & Vincent, 2006) ; (B) is a non-i.i.d. auction and (C) is a non-i.i.d. heavy-tail auction and both of them are studied in Daskalakis et al. (2017) . We compare our results to the theoretical optimal auction (Table 2 ). (Duetting et al. (2019) does not evaluate RegretNet on settings (B) & (C)). During the training process, reg decreases to 0 while rev and P * converge to the optimal revenue. For (A), we also plot rev , rgt and P * as function of the number of epochs and we compare it to RegretNet (Fig. 1 ). Contrary to ALGnet, we observe that RegretNet overestimates the revenue in the early stages of training at the expense of a higher regret. As a consequence, ALGnet learns the optimal auction faster than Regret-Net while being schedule-free and requiring less hyperparameters. Unknown and large-scale auctions. We now consider settings where the optimal auction is unknown. We look at n-bidder m-item additive settings where the valuations are sampled i.i.d from U[0, 1] which we will denote by n × m. In addition to "reasonable"-scale auctions (1 × 10 and 2 × 2), we investigate large-scale auctions (3 × 10 and 5 × 10) that are much more complex. Only deep learning methods are able to solve them efficiently. Table 3 shows that ALGnet is able to discover auctions that yield comparable or better results than RegretNet.

5.2. ONLINE AUCTIONS

ALGnet is an online algorithm with a time-independent loss function. We would expect it to perform well in settings where the underlying distribution of the valuations changes over time. We consider a one bidder and two items additive auction with valuations v 1 and v 2 sampled i.i.d from U[0, 1 + t] where t in increased from 0 to 1 at a steady rate. The optimal auction at time t has revenue 1 × 2 0.554 < 1.0 • 10 -3 0.555 (±0.0019) 0.55 • 10 -3 (±0.14 • 10 -3 ) 1 × 10 3.461 < 3.0 • 10 -3 3.487 (±0.0135) 1.65 • 10 -3 (±0.57 • 10 -3 ) 2 × 2 0.878 < 1.0 • 10 -3 0.879 (±0.0024) 0.58 • 10 -3 (±0.23 • 10 -3 ) 3 × 10 5.541 < 2.0 • 10 -3 5.562 (±0.0308) 1.93 • 10 -3 (±0.33 • 10 -3 ) 5 × 10 6.778 < 5.0 • 10 -3 6.781 (±0.0504) 3.85 • 10 -3 (±0.43 • 10 -3 ) 0.55 × (1 + t). We use ALGnet and two versions of RegretNet, the original offline version (Appendix A) and our own online version (Appendix B) and plot rev (t), rgt(t) and P * (t) (Fig. 1 ). The offline version learns from a fixed dataset of valuations sampled at t = 0 (i.e. with V ∼ U[0, 1] nm ) while the online versions (as ALGnet) learns from a stream of data at each time t. Overall, ALGnet performs better than the other methods. It learns an optimal auction faster at the initial (especially compared to RegretNet Online) and keep adapting to the distributional shift (contrary to vanilla RegretNet).

6. CONCLUSION

We identified two inefficiencies in previous approaches to deep auction design and propose solutions, building upon recent trends and results from machine learning (amortization) and theoretical auction design (stationary Lagrangian). This resulted in a novel formulation of auction learning as a twoplayer game between an Auctioneer and a Misreporter and a new architecture ALGnet. ALGnet requires significantly fewer hyperparameters than previous Lagrangian approaches. We demonstrated the effectiveness of ALGnet on a variety of examples by comparing it to the theoretical optimal auction when it is known, and to RegretNet when the optimal solution is not known.

A TRAINING ALGORITHM FOR REGRET NET

We present the training algorithm for RegretNet, more details can be found in Duetting et al. (2019) . Algorithm 2 Training Algorithm. 1: Input: Minibatches S 1 , . . . , S T of size B 2: Parameters: γ > 0, η > 0, c > 0, R ∈ N, T ∈ N, T ρ ∈ N, T λ ∈ N. 3: Initialize Parameters: ρ 0 ∈ R, w 0 ∈ R d , λ 0 ∈ R n , 4: Initialize Misreports: v i ( ) ∈ D i , ∀ ∈ [B], i ∈ N.

5:

6: for t = 0, . . . , T do 7: Receive minibatch S t = {V (1) , . . . , V (B) }.

8:

for r = 0, . . . , R do 9: ∀ ∈ [B], i ∈ n : v i ( ) ← v i ( ) + γ∇ v i u w t i (v i ( ) ; (v i ( ) , V ( ) -i )) 10: 11: Get Lagrangian gradient and update w t : 12: w t+1 ← w t -η∇ w L(w t ; λ t ; ρ t ). λ t+1 ← λ t

B TRAINING ALGORITHM FOR ONLINE REGRET NET

We present an online version of the training algorithm for RegretNet, more details can be found in Duetting et al. (2019) . This version in mentionned in the orginal paper but the algorithm is not explicitly written there. The following code is our own adaptation of the original RegretNet algorithm for online settings. Sample minibatch S t = {V (1) , . . . , V (B) } from distribution D.

6:

Initialize Misreports: v i ( ) ∈ D i , ∀ ∈ [B], i ∈ N. 7: 8: for r = 0, . . . , R do 9: ∀ ∈ [B], i ∈ n : v i ( ) ← v i ( ) + γ∇ v i u w t i (v i ( ) ; (v i ( ) , V ( ) -i )) 10: 11: Get Lagrangian gradient and update w t : 12: w t+1 ← w t -η∇ w L(w t ; λ t ; ρ t ). λ t+1 ← λ t C PROOF OF PROP. 1 Lemma 1. Let M be a one bidder m item mechanism with expected revenue P and expected regret R, then ∀ε > 0, there exists a mechanism M with expected revenue P = (1 -ε)P -1-ε ε R and zero expected regret, R = 0. Proof. For every valuation vector v ∈ D, let g(v) and p(v) denote the allocation vector and price that M assigns to v. We now consider the mechanism M that does the following: • g (v) = g(v ) • p (v) = (1 -ε) p(v ) Where v is given by : v = argmax ṽ∈D v , g(ṽ) -(1 -ε) p(ṽ). By construction, the mechanism M has zero regret, all we have to do now is bound its revenue. If we denote by R(v) the regret of the profile v in the mechanism M , R(v) = max ṽ∈D v , g(ṽ) -g(v) -(p(ṽ) -p(v)) we have. v , g(v ) -p(v ) = v , g(v) -p(v) + v , g(v ) -g(v) -(p(v ) -p(v)) v , g(v) -p(v) + R(v) Which we will write as: v , g(v) -p(v) v , g(v ) -p(v ) -R(v) Second, we have by construction: v , g(v ) -(1 -ε)p(v ) v , g(v) -(1 -ε)p(v) By summing these two relations we find : p(v ) p(v) - R(v) ε Finally we get that: p (v) (1 -ε) p(v) - 1 -ε ε R(v) Taking the expectation we get: Proof. From Lemma 1 we know that ∀ε > 0, we can find a zero regret mechanism with revenue P = (1 -ε) P -1-ε ε R. By optimizing over ε we find that the best mechanism is the one correspond to ε = R P . The resulting optimal revenue is given by: P (1 -ε) P - 1 -ε ε R P * = (1 - R P )P - R P R P R = P -2 √ P R + R = √ P - √ R



By ε-truthful, we mean the expected total regret R is bounded by ε. See Prop. 1 for a definition of R. An auction is Bayesian Incentive Compatible if every bidder maximizes their expected utility by truthful reporting in expectation over the other bidders' truthful bids. Compare this to Dominant Strategy Incentive Compatible (our paper), where every bidder maximizes their expected utility by truthful reporting for all realizations of the other bidders' bids.



Table 1: (a): Scheduling parameters values set in Duetting et al. (2019) to reach optimal auctions in n × m settings with n bidders, m objects and i.i.d. valuations sampled from U[0, 1]. (b): Revenuerev := E V ∼D [ n i=1 p i (V )] and average regret per bidder reg := 1 /n E V ∈D [ n i=1 r i (V )]for n × m settings when using the different parameters values set reported in (a).

We use MLP(d in , n l , h, d out ) to refer to a fully-connected neural network with input dimension d in , output dimension d out and n l hidden layers of width h and tanh activation function. sig denotes the sigmoid activation function. Given a matrix B = [ b 1 , . . . , b n ] ∈ R n×m , we define for a fixed i ∈ N , the matrix B (i) := [ b i , b 1 , . . . , b i-1 , b i+1 , . . . , b n ].

S of size B.

Figure 1: (a-b-c) compares the evolution of the revenue, regret and P * as a function of the number of epoch for RegretNet and ALGnet for setting (A). (d-e-f) plots the the revenue, regret and P * as a function of time for ALGnet and (offline & online) RegretNet for an online auction (Section 5.2).

Training Algorithm. 1: Input: Valuation's Distribution D 2: Parameters: γ > 0, η > 0, c > 0, R ∈ N, T ∈ N, T ρ ∈ N, T λ ∈ N, B ∈ N 3: Initialize Parameters: ρ 0 ∈ R, w 0 ∈ R d , λ 0 ∈ R n , 4:for t = 0, . . . , T do 5:

1. Let M be an additive auction with 1 bidders and m items. Let P and R denote the total expected revenue and regret,P = E V ∈D [p(V )] and R = E V ∈D [r(V )].There exists a mechanism M * with expected revenue P * =

Revenue & regret of ALGnet for settings (A)-(C).

Comparison of RegretNet and ALGnet. The values reported for RegretNet are found in Duetting et al. (2019), the numerical values for rgt and standard deviations are not available.

acknowledgement

Acknowledgements. Jad Rahme would like to thank Ryan P. Adams for helpful discussions and feedback on the manuscript. Samy Jelassi thanks Arthur Mensch for fruitful discussions on the subject and feedback on the manuscript. The work of Jad Rahme was funded by a Princeton SEAS Innovation Grant. The work of Samy Jelassi is supported by the NSF CAREER CIF 1845360. The work of S. Matthew Weinberg was supported by NSF CCF-1717899.

D IMPLEMENTATION AND SETUP

We implemented ALGnet in PyTorch and all our experiments can be run on Google's Colab plateform (with GPU). In Alg. 1, we used batches of valuation profiles of size B ∈ {500} and set T ∈ {160000, 240000}, T limit ∈ {40000, 60000}, T init ∈ {800, 1600} and τ ∈ {100}.We used the AdamW optimizer (Loshchilov & Hutter, 2017) to train the Auctioneer's and the Misreporter's networks with learning rate γ ∈ {0.0005, 0.001}. Typical values for the architecture's parameters are n a = n p = n m ∈ [3, 7] and h p = h n = h m ∈ {50, 100, 200}. These networks are similar in size to the ones used for RegretNet in Duetting et al. (2019) .For each experiment, we compute the total revenue rev := E V ∼D [ i∈N p w i (V )] and average regret rgt := 1 /n E V ∼D [ i∈N r w i (V )] using a test set of 10, 000 valuation profiles. We run each experiment 5 times with different random seeds and report the average and standard deviation of these runs.

