Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Location indicates the building first and then the room number!

Click on "Floor plan" for orientation in the builings and on the campus.

 
Only Sessions at Location/Venue 
 
 
Session Overview
Session
S 1 (3): Machine Learning
Time:
Tuesday, 11/Mar/2025:
4:20 pm - 6:00 pm

Session Chair: Merle Behr
Session Chair: Alexandra Carpentier
Location: POT 06
Floor plan

Potthoff Bau
Session Topics:
1. Machine Learning

Show help for 'Increase or decrease the abstract text size'
Presentations
4:20 pm - 4:45 pm

Effective fluctuating continuum models for SGD with small learning rate, or in overparameterized limits

Benjamin Gess

MPI MIS Leipzig & Universität Bielefeld, Germany

In this talk we present recent results on the derivation of effective models for the training dynamics of SGD in limits of small learning rate or large, shallow networks. The focus lies on developing effective limiting models that also capture the fluctuations inherent in SGD. This will lead to novel concepts of stochastic modified flows, and distribution dependent modified flows. The advantage of these limiting models is that they match the SGD dynamics to higher order, and recover the correct multi-point distributions.

This is joint work with Vitalii Konarovskyi and Sebastian Kassing.


4:45 pm - 5:10 pm

Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization

Michael Kohler1, Adam Krzyżak2, Alisha Sänger1

1TU Darmstadt, Germany; 2Concordia University, Canada

Image classification from independent and identically distributed random variables is considered. Image classifiers are defined which are based on a linear combination of deep convolutional networks with max-pooling layer. Here all the weights are learned by stochastic gradient descent.

A general result is presented which shows that the image classifiers are able to approximate the best possible deep convolutional network. In case that the a posteriori probability satisfies a suitable hierarchical composition model it is shown that the corresponding deep convolutional neural network image classifier achieves a rate of convergence which is independent of the dimension of the images.



5:10 pm - 5:35 pm

Optimal Rates for Forward Gradient Descent based on Multiple Queries

Niklas Dexheimer, Johannes Schmidt-Hieber

University of Twente, Netherlands, The

We investigate the prediction error of forward gradient descent based on multiple queries in the linear model. It is shown that if the number of queries is chosen suitably large, the minimax optimal rate of convergence can be achieved, matching the performance of stochastic gradient descent. The results also address the case of low-rank training data, which can be beneficial in high-dimensional problems. As forward gradient descent only requires forward passes through a network, which are feasible in the human brain, our results show that rate-optimal results are achievable by biologically plausible optimization methods.


5:35 pm - 6:00 pm

Stochastic Modified Flows for Riemannian Stochastic Gradient Descent

Benjamin Gess2,3, Sebastian Kassing1, Nimit Rana4

1Universität Bielefeld, Germany; 2Technische Universität Berlin, Germany; 3Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; 4University of York, UK

We give quantitative estimates for the rate of convergence of Riemannian stochastic gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so-called Riemannian stochastic modified flow (RSMF). Using tools from stochastic differential geometry we show that, in the small learning rate regime, RSGD can be approximated by the solution to the RSMF driven by an infinite-dimensional Wiener process. The RSMF accounts for the random fluctuations of RSGD and, thereby, increases the order of approximation compared to the deterministic Riemannian gradient flow. The RSGD is build using the concept of a retraction map, that is, a cost efficient approximation of the exponential map, and we prove quantitative bounds for the weak error of the diffusion approximation under assumptions on the retraction map, the geometry of the manifold, and the random estimators of the gradient.


 
Contact and Legal Notice · Contact Address:
Conference: GPSD 2025
Conference Software: ConfTool Pro 2.8.105
© 2001–2025 by Dr. H. Weinreich, Hamburg, Germany