Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Location indicates the building first and then the room number!
Click on "Floor plan" for orientation in the builings and on the campus.
|
Session Overview |
Session | ||
S 1 (2): Machine Learning
Session Topics: 1. Machine Learning
| ||
Presentations | ||
2:00 pm - 2:25 pm
Clustering Experts with Bandit Feedback of their Performance in Multiple Tasks 1INRAE, Mistea, Institut Agro, Univ Montpellier, Montpellier, France; 2Institut für Mathematik, Universität Potsdam, Potsdam, Germany
We study the problem of clustering a set of experts from their performances in many tasks. We assume that a set of $N$ experts can be partitioned into two groups, where experts within the same group exhibit identical performance on any given task, over a possibly large number of tasks $d$. We consider a sequential and adaptive setting: at each time step $t$, the learner selects an expert-task pair and receives a noisy observation of the expert’s performance, which depends on both the task and the expert's group. The learner’s objective is to recover the correct partition of the experts with as few observations as possible.
We propose an efficient $\delta$-PAC algorithm that, with probability at least $1-\delta$, accurately recovers the partition. The algorithm leverages the sequential halving method and optimally balances exploration across tasks — to estimate performance gaps between groups — and across experts — to infer the correct partition. We establish an instance-dependent upper bound on the number of observations required for partition recovery, which holds with probability at least $1-\delta$, and provide a matching lower bound, up to poly-logarithmic factors.
2:25 pm - 2:50 pm
Permutation Estimation for Crowdsourcing 1Universität Potsdam, Germany; 2INRAE, Univ. Montpellier, France
We consider a ranking problem where a set of experts answers to a set of questions. The aim is to rank experts by competence based on their answers. We assume that, for every pair of experts, one of the experts has for every question at least the same probability to answer correctly as the other expert. Moreover, we suppose that the questions can be ordered by difficulty in the same sense. Storing the probabilities of a correct answer for each expert and every question yields a matrix, and our assumption means that this matrix is bi-isotonic up to permutations of its rows and columns.
In the general setting of ranking over this class of permuted bi-isotonic matrices, no algorithm is known that is at the same time computationally efficient and optimal in the minimax-sense. In our work, we focus on the special case of bi-isotonic matrices taking only two values, and present a polynomial time method that matches the minimax lower bound up to poly-logarithmic factors.
2:50 pm - 3:15 pm
A random measure approach to reinforcement learning in continuous time 1Department of Mathematics, Saarland University, Germany; 2Department of Mathematics, Vinh University, Vietnam
We propose a random measure framework for modeling exploration, i.e. the execution of measure-valued controls, in continuous-time reinforcement learning (RL) with controlled diffusion and jumps. We first address the situation where sampling the randomized control in continuous time takes place on a discrete-time grid, and we reformulate the resulting stochastic differential equation (SDE) as an equation driven by appropriate random measures. These random measures are constructed using the Brownian motion and Poisson random measure, which represent the sources of randomness in the original model dynamics, along with additional random variables sampled on the grid for control execution. We then establish the vague convergence for these random measures as the grid’s mesh-size tends to zero. This limit theorem suggests a grid-sampling limit SDE driven by both white noise random measures and a Poisson random measure, which models the control problem with randomized controls in continuous time. Moreover, we discuss the grid-sampling limit SDE in comparison with the exploratory SDE (e.g., [4]) and the sample state process (e.g., [2, 3]) used in recent continuous-time RL literature.
References
[1] C. Bender and N.T. Thuan. A random measure approach to reinforcement learning in continuous time. arXiv:2409.17200, preprint (2024).
[2] Y. Jia and X.Y. Zhou. Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Mach. Learn. Res. 23 (2022) 1--50.
[3] Y. Jia and X.Y. Zhou. $q$-Learning in continuous time. J. Mach. Learn. Res. 24 (2023) 1--61.
[4] H. Wang, T. Zariphopoulou and X.Y. Zhou. Reinforcement learning in continuous time and space: A stochastic control approach. J. Mach. Learn. Res. 21 (2020) 1--34.
|
Contact and Legal Notice · Contact Address: Conference: GPSD 2025 |
Conference Software: ConfTool Pro 2.8.105 © 2001–2025 by Dr. H. Weinreich, Hamburg, Germany |