OR 2024 - ConfTool Pro Printout

Deep Reinforcement Learning for Equilibrium Computation in Multi-Stage Auctions and Contests

Martin Bichler, Nils Kohring, Fabian Raoul Pieroth

Technical University of Munich, Germany

We compute equilibrium strategies in multi-stage games with continuous signal and action spaces as they are widely used in the management sciences and economics. Examples include sequential sales via auctions, multi-stage elimination contests, and Stackelberg competitions. In sequential auctions, analysts are required to derive not just single bids but bid functions for all possible signals or values that a bidder might have in multiple stages. Due to the continuity of the signal and action spaces, these bid functions come from an infinite dimensional space. While such models are fundamental to game theory and its applications, equilibrium strategies are rarely known. The resulting system of non-linear differential equations is considered intractable for all but elementary models. This has been limiting progress in game theory and is a barrier to its adoption in the field. We show that Deep Reinforcement Learning and self-play can learn equilibrium bidding strategies for various multi-stage games without making parametric assumptions on the bid function. We find equilibrium in models that have not yet been explored analytically and new asymmetric equilibrium bid functions for established models of sequential auctions. The verification of equilibrium is challenging in such games due to the continuous signal and action spaces. We introduce a verification algorithm and prove that the error of this verifier decreases when considering Lipschitz continuous strategies with increasing levels of discretization and sample sizes.

Fast and accurate approximations of traffic equilibria via structured learning pipelines

Kai Jungel¹, Dario Paccagnan², Axel Parmentier³, Maximilian Schiffer^1,4

¹School of Management, Technical University of Munich, Germany; ²Department of Computing, Imperial College London, United Kingdom; ³CERMICS, École des Ponts, France; ⁴Munich Data Science Institute, Technical University of Munich, Germany

Growing urbanization and increasing traffic volumes give rise to the need for smart traffic control systems to mitigate the negative externalities of traffic. Recently, many learning-based traffic control systems evolved, which either learn via self-play or require, at minimum, many training data points to achieve a good performance. Generating such data points or system responses via computationally costly traffic simulations hampers the development of the respective algorithms. Against this background, we present a deep structured learning pipeline that allows predicting system equilibria for traffic assignment problems with significantly reduced computational effort. This pipeline comprises a machine learning (ML)-layer and a combinatorial optimization (CO)-layer. The ML-layer receives as input the system state and predicts the parameterization of an equilibrium problem, e.g., latency functions. The CO-layer solves the parameterized equilibrium problem and outputs the system equilibrium. Central to successfully applying this pipeline is training the ML-layer such that the CO-layer outputs good equilibria. To do so, we train the ML-layer on historical data by minimizing a fenchel-young loss. Minimizing the fenchel-young loss reduces the equilibrium error induced by the ML-layer prediction. We study various algorithmic architectures within our pipeline, e.g., learning the costs of multi-commodity flow problems or learning latency functions of wardrop equilibria problems. In this work, we demonstrate the performance of our pipeline by predicting traffic assignments for stylized experiments and outputs from the traffic simulation software MATSim. Our experiments reveal that our pipeline reduces the prediction error by around 70% on average compared to supervised learning approaches.

Conference Agenda