JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at karsch@saw-tagungsmanagement.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Location indicates the building first and then the room number!

Click on "Floor plan" for orientation in the builings and on the campus.

Session Overview

Session

S 1 (2): Machine Learning

Time:

Tuesday, 11/Mar/2025:

2:00 pm - 3:40 pm

Session Chair: Merle Behr
Session Chair: Alexandra Carpentier

Location: POT 06
Floor plan

Potthoff Bau

Session Topics:

1. Machine Learning

Presentations

2:00 pm - 2:25 pm

Clustering Experts with Bandit Feedback of their Performance in Multiple Tasks

Victor Thuot¹, Maximilian Graf², Nicolas Verzelen¹

¹INRAE, Mistea, Institut Agro, Univ Montpellier, Montpellier, France; ²Institut für Mathematik, Universität Potsdam, Potsdam, Germany

We study the problem of clustering a set of experts from their performances in many tasks. We assume that a set of $N$ experts can be partitioned into two groups, where experts within the same group exhibit identical performance on any given task, over a possibly large number of tasks $d$. We consider a sequential and adaptive setting: at each time step $t$, the learner selects an expert-task pair and receives a noisy observation of the expert’s performance, which depends on both the task and the expert's group. The learner’s objective is to recover the correct partition of the experts with as few observations as possible.

We propose an efficient $\delta$-PAC algorithm that, with probability at least $1-\delta$, accurately recovers the partition. The algorithm leverages the sequential halving method and optimally balances exploration across tasks — to estimate performance gaps between groups — and across experts — to infer the correct partition. We establish an instance-dependent upper bound on the number of observations required for partition recovery, which holds with probability at least $1-\delta$, and provide a matching lower bound, up to poly-logarithmic factors.

2:25 pm - 2:50 pm

Permutation Estimation for Crowdsourcing

Maximilian Graf¹, Alexandra Carpentier¹, Nicolas Verzelen²

¹Universität Potsdam, Germany; ²INRAE, Univ. Montpellier, France

We consider a ranking problem where a set of experts answers to a set of questions. The aim is to rank experts by competence based on their answers. We assume that, for every pair of experts, one of the experts has for every question at least the same probability to answer correctly as the other expert. Moreover, we suppose that the questions can be ordered by difficulty in the same sense. Storing the probabilities of a correct answer for each expert and every question yields a matrix, and our assumption means that this matrix is bi-isotonic up to permutations of its rows and columns.

In the general setting of ranking over this class of permuted bi-isotonic matrices, no algorithm is known that is at the same time computationally efficient and optimal in the minimax-sense. In our work, we focus on the special case of bi-isotonic matrices taking only two values, and present a polynomial time method that matches the minimax lower bound up to poly-logarithmic factors.

2:50 pm - 3:15 pm

A random measure approach to reinforcement learning in continuous time

Christian Bender¹, Thuan Nguyen^1,2

¹Department of Mathematics, Saarland University, Germany; ²Department of Mathematics, Vinh University, Vietnam

We propose a random measure framework for modeling exploration, i.e. the execution of measure-valued controls, in continuous-time reinforcement learning (RL) with controlled diffusion and jumps. We first address the situation where sampling the randomized control in continuous time takes place on a discrete-time grid, and we reformulate the resulting stochastic differential equation (SDE) as an equation driven by appropriate random measures. These random measures are constructed using the Brownian motion and Poisson random measure, which represent the sources of randomness in the original model dynamics, along with additional random variables sampled on the grid for control execution. We then establish the vague convergence for these random measures as the grid’s mesh-size tends to zero. This limit theorem suggests a grid-sampling limit SDE driven by both white noise random measures and a Poisson random measure, which models the control problem with randomized controls in continuous time. Moreover, we discuss the grid-sampling limit SDE in comparison with the exploratory SDE (e.g., [4]) and the sample state process (e.g., [2, 3]) used in recent continuous-time RL literature.

References

[1] C. Bender and N.T. Thuan. A random measure approach to reinforcement learning in continuous time. arXiv:2409.17200, preprint (2024).

[2] Y. Jia and X.Y. Zhou. Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Mach. Learn. Res. 23 (2022) 1--50.

[3] Y. Jia and X.Y. Zhou. $q$-Learning in continuous time. J. Mach. Learn. Res. 24 (2023) 1--61.

[4] H. Wang, T. Zariphopoulou and X.Y. Zhou. Reinforcement learning in continuous time and space: A stochastic control approach. J. Mach. Learn. Res. 21 (2020) 1--34.

Mobile View Print View

Contact and Legal Notice · Contact Address:

karsch{at}saw-tagungsmanagement dot

com

Conference: GPSD 2025