NO-REGRET LEARNING IN REPEATED FIRST-PRICE AUCTIONS WITH BUDGET CONSTRAINTS Anonymous

Abstract

Recently the online advertising market has exhibited a gradual shift from secondprice auctions to first-price auctions. Although there has been a line of works concerning online bidding strategies in first-price auctions, it still remains open how to handle budget constraints in the problem. In the present paper, we initiate the study for a buyer with a budget to learn her online bidding strategies in repeated first-price auctions. We propose an RL-based bidding algorithm against the optimal non-anticipating strategy under stationary competition. Our algorithm obtains O( √ T )-regret if the bids are all revealed at the end of each round, where O(•) is a variant of the big-O that hides logarithmic factors. With the restriction that the buyer only sees the winning bid after each round, our modified algorithm obtains O(T 7 12 )regret by techniques developed from survival analysis. Our analysis extends to the more general scenario where the buyer can have any bounded instantaneous utility function with regrets of the same order. Simulation experiments show that the constant factor inside the regret bound is rather small.

1. INTRODUCTION

There has been extensive growth in the online advertising market in recent years. It was estimated that the volume of online advertising worldwide would reach 500 billion dollars in 2022 (Statista, 2021) . In such a market, advertising platforms use auctions to allocate ad opportunities. Typically, each advertiser has a limited amount of capital for an advertisement campaign. Therefore, consecutive rounds of competition are interconnected by budgets of participating advertisers. Furthermore, each advertiser has very limited knowledge of 1) her valuation of certain keywords and 2) the competitors she is facing. There are many works being devoted to studying algorithms for learning strategies for optimally spending the budget in repeated second-price auctions (see Section 1.1). In practice, on the other hand, we have witnessed numerous switches from second-price auctions to first-price auctions in the online advertising market. A recent remarkable example is Google AdSenses' integrated move at the end of 2021 (LLC, 2021) . Earlier examples also include AppNexus, Index Exchange, and OpenX (Sluis, 2017) . This industry-wide shift is due to various factors including a fairer transactional process and increased transparency. Therefore, the shift to first-price auctions brings about major importance to the following open question which is barely considered in previous works: How should budget-constrained advertisers learn to compete in repeated first-price auctions? This paper thus initiates the study of learning to bid with budget constraints in repeated first-price auctions. It has been noted that the application of first-price auctions with budgets is not limited to online advertising mentioned above. Traditional competitive environments like mussel trade in Netherlands (van Schaik et al., 2001) , modern price competition, and procurement auctions (e.g. U.S. Treasury Securities auction (Chari and Weber, 1992) ) are examples as well.

Challenges and contributions

The challenges in this setting are two-fold. The first challenge relates to the specific information structure of first-price auctions. In practice, it is often the case that only the highest bid is revealed to all participants (Esponda, 2008) . This is known as censored-feedback or an informational version of winner's curse in literature (Capen et al., 1971 ). This affects the information structure of learning since the buyer learns less information when she wins. This makes the problem challenging compared to standard contextual bandits (c.f. Section 1.1). The second challenge is more fundamental. It is known that the strategy in first-price auctions is notoriously complex to analyze, even in the static case (Lebrun, 1996) . To get an intuitive feeling of this difficulty in our problem compared to repeated second-price auctions. Let us consider the offline case where the opponents' bids are all known. Given the budget, the problem for second-price auctions can be reduced to a pure knapsack problem, where the budget is regarded as weight capacity and prices as weights. This structure enables mature techniques including duality theory to be applied to study the benchmark strategy. Pitifully in first-price auctions, since the payment depends on the buyer's own bid, the previous approach/benchmark is not directly usable. We provide a concrete example to further illustrate such difficulties. Example 1.1. Consider a case where the buyer's value v follows a uniform distribution on [0.4, 1] and the highest bid m of her opponents' follows a uniform distribution on [0, 0.5]. The time horizon is T and the buyer's budget B = 0.5T . The first-best benchmark (an anticipatingfoot_0 strategy that knows her values and her opponents' bids) can be viewed as a knapsack problem, which is E v∼F T m∼G T max b1,...,b T T t=1 (v t -b t )1 {bt≥mt} subject to T t=1 1 {bt≥mt} b t ≤ B ∀(v t ) T t=1 ; (m t ) T t=1 , where v t is her value and m t is her opponents' highest bid at time t. The buyer wants to determine each b t to maximize the revenue. In hindsight, we need to pay as close to m t as possible. Using the theory of knapsack, the utility is T • E[1 {v≥m} (v -m)] + = 0. 45T . On the contrary, the optimal non-anticipating bidding strategy in a first-price auction is to bid v 2 and the utility is T • E[1 { v 2 ≥m} v 2 ] = 0.26T . There is already an Ω(T ) separation between the first-best benchmark and the ideal case with full information. This example shows that simple characterization of the optimum in Balseiro and Gur ( 2019) is not applicable to our problem. Furthermore, it remains unclear what methodology can be applied in first-price auctions with budgets. The state-of-the-art adaptive pacing strategy downgrades to truthful bidding as the budget increases, so in first-price auctions, it may result in near-zero reward and thus cannot have any theoretical guarantee (see (Balseiro and Gur, 2019, §2.4) for further discussions). The present paper takes the first step to combat the challenges mentioned above with a dynamic programming approach. Correspondingly, our contribution is also two-fold: • We provide an RL-based learning algorithm. Through the characterization of the optimal strategy, we obtain O( √ T )-regret guarantee for the algorithm in the full-feedback casefoot_1 . • In the censored-feedback setting, by techniques developed from survival analysis, we modify our algorithm and obtain a regret of O(T 7 12 ).

1.1. RELATED WORK

Repeated second-price auctions with budgets There is a flourishing source of literature on bidding strategies in repeated auctions with budgets. Through the lens of online learning, Balseiro and Gur (2019) identify asymptotically optimal online bidding strategies known as pacing (a.k.a. bid-shading in literature) in repeated second-price auctions with budgets. Inspired by the pacing strategy, Flajolet and Jaillet (2017) develop no-regret non-anticipating algorithms for learning with contextual information in repeated second-price auctions. Another line of works that uses similar techniques in the present paper includes Amin et al. 



An algorithm is anticipating if bid selection depends on future observations, seeFlajolet and Jaillet (2017). This is especially practical in public-sector auctions(Chari and Weber, 1992) as regulations mandate all bids to be revealed.



(2012);Tran-Thanh et al. (2014);Gummadi  et al. (2012).Gummadi et al. (2012) and Amin et al. (2012)  study bidding strategies in repeated second-price auctions with budget constraints, but the former does not involve any learning and the latter does not provide any regret analysis (their estimator is biased).Tran-Thanh et al. (2014)   derive regret bounds for the same scenario but the optimization objective is the number of items won instead of value or surplus.Baltaoglu et al. (2017)  also use dynamic programming to tackle repeated

