Workshop: Generative Models in Science and Machine Learning

Scientific programme

Monday, September 22

Talk: 9:10 - 10:05

Sebastian Reich: "Generative Modelling using Schrödinger Bridges"
Show Abstract

I will consider the generative problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. The general approach is that of plug & play Langevin dynamics where the required data-driven drift term is approximated using Schrödinger bridges. A key bottleneck of this approach is the exponential dependence of the required training samples on the dimension, d, of the ambient state space. I will discuss a localization strategy which exploits conditional independence of conditional expectation values. Localization thus replaces a single high-dimensional Schrödinger bridge problem by d low-dimensional Schrödinger bridge problems over the available training samples. In this context, a connection to multi-head self attention transformer architectures is established. As for the original Schrödinger bridge sampling approach, the localized sampler is stable and geometric ergodic. The sampler also naturally extends to conditional sampling and to Bayesian inference. I will demonstrate the performance of the proposed scheme through experiments on a Gaussian problem with increasing dimensions and several problems involving inferring stochastic processes from given time-series.

Talk: 10:05 - 11:00

Sören Christensen: "Adaptive Termination in Generative Diffusion: A Time-Reversal Approach"
Show Abstract

In this talk, we explore generative diffusion models through the lens of time-reversed stochastic processes. Starting from the stochastic theory of time-reversed Markov processes — a classical concept in probability — we show how random time horizons and Doob's h-transform allow a process to terminate exactly at a desired sampling distribution, at a random time chosen to minimize the expected runtime. This perspective leads to a new class of generative diffusion models that maintain a time-homogeneous structure for both the noising and denoising phases. The resulting dynamics automatically adapt the number of sampling steps to the current noise level, enabling more efficient sampling. For data with low intrinsic dimensionality, the termination condition reduces to a simple first-hitting rule, offering new insights into the manifold hypothesis. The method is also well-suited for natural conditioning, providing the foundation for the subsequent talk by Jan Kallsen. Joint work with Jan Kallsen, Claudia Strauch, and Lukas Trottner (arXiv:2501.19373).

Talk: 11:30 - 12:25

Jan Kallsen: "Model-free filtering in high dimensions via projection and score-based diffusions"
Show Abstract

We consider the problem of recovering a latent signal from its noisy observation. The unknown law of the signal and in particular its support are accessible only through a large sample of i.i.d. training data. We further assume the support to be a low-dimensional submanifold of a high-dimensional Euclidean space. As a filter or denoiser we suggest an estimator of the metric projection of the observation on the submanifold. For its computation we study an auxiliary semiparametric model where the observation is obtained by adding isotropic Laplace noise to the signal. Using score matching in a corresponding diffusion model, we obtain an estimator of the Bayesian posterior in this setup. Our main theoretical result shows that, in the limit of high dimensions, this posterior is concentrated near the desired metric projection of the observation on the signal submanifold. Based on joint work with Sören Christensen, Claudia Strauch, and Lukas Trottner.

Short talk: 12:25 - 13:00

Benjamin Dupuis: "Algorithm- and Data-Dependent Generalization Bounds for Diffusion Models"
Show Abstract

Score-based generative models (SGMs) have emerged as one of the most popular classes of generative models. A substantial body of work now exists on the analysis of SGMs, focusing either on discretization aspects or on their statistical performance. In the latter case, bounds have been derived, under various metrics, between the true data distribution and the distribution induced by the SGM, often demonstrating polynomial convergence rates with respect to the number of training samples. However, these approaches adopt a largely approximation theory viewpoint, which tends to be overly pessimistic and relatively coarse. In particular, they fail to fully explain the empirical success of SGMs or capture the role of the optimization algorithm used in practice to train the score network. To support this observation, we first present simple experiments illustrating the concrete impact of optimization hyperparameters on the generalization ability of the generated distribution. Then, this paper aims to bridge this theoretical gap by providing the first algorithmic- and data-dependent generalization analysis for SGMs. In particular, we establish bounds that explicitly account for the optimization dynamics of the learning algorithm, offering new insights into the generalization behavior of SGMs. Our theoretical findings are supported by empirical results on several datasets.

Talk: 14:00 - 14:55

Aleksandar Mijatovic: "Non-asymptotic bounds on the forward process in denoising diffusions"
Show Abstract

Denoising diffusion probabilistic models (DDPMs) represent a recent advance in generative modelling that has delivered state-of-the-art results across many domains of applications. Despite their success, a rigorous theoretical understanding of the error within DDPMs, particularly the non-asymptotic bounds required for the comparison of their efficiency, remain scarce. Making minimal assumptions on the initial data distribution, allowing for example the manifold hypothesis, in this talk I will present explicit non-asymptotic bounds on the forward diffusion error in total variation (TV), expressed as a function of the terminal time T. The key idea is to parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise. Our analysis rigorously proves that, under mild assumptions, the canonical choice of the Ornstein-Uhlenbeck (OU) process cannot be significantly improved in terms of reducing the terminal time $T$ as a function of $R$ and error tolerance $\epsilon>0$. Motivated by data distributions arising in generative modelling, we also establish a cut-off like phenomenon (as $R\to\infty$) for the convergence to its invariant measure in TV of an OU process, initialized at a multi-modal distribution with maximal mode distance $R$. This joint work with Miha Brešar (CUHK-Shenzhen) is to appear in the Annals of Applied Probability.

Short talk: 14:55 - 15:30

Lapo Rastrelli: "Toward Physics-Aware Video Prediction: Challenges in Modeling Hamiltonian Dynamics"
Show Abstract

The ability to generate realistic and controllable video sequences is a central challenge in generative modelling. While recent advances in video diffusion models have led to tremendous improvements in both visual quality and physical coherence, state-of-the-art models still lack the ability to forecast physically consistent dynamical system. Inspired by recent advances in sequential diffusion models, we present initial work on an inference scheme that couples diffusion models with latent-variable models, incorporating Hamiltonian priors to better capture system dynamics. In addition to laying out the inference techniques behind this approach, we address the challenges of constructing such physical prior models from the perspective of learning forecastable Hamiltonian dynamical systems and highlight the remaining steps towards developing physics-aware video diffusion models.

Talk: 16:00 - 16:55

Arnak Dalalyan: "Assessing the Quality of Denoising Diffusion Models in Wasserstein Distance"
Show Abstract

Generative modeling aims to produce new random examples from an unknown target distribution, given access to a finite collection of examples. Among the leading approaches, denoising diffusion probabilistic models (DDPMs) construct such examples by mapping a Brownian motion via a diffusion process driven by an estimated score function. In this work, we first provide empirical evidence that DDPMs are robust to constant-variance noise in the score evaluations. We then establish finite-sample guarantees in Wasserstein-2 distance that exhibit two key features: (i) they characterize and quantify the robustness of DDPMs to noisy score estimates, and (ii) they achieve faster convergence rates than previously known results. Furthermore, we observe that the obtained rates match those known in the Gaussian case, implying their optimality. Joint work with Vahan Arsenyan and Elen Vardanyan. Preprint available at: https://arxiv.org/abs/2506.09681.

Tuesday, September 23

Talk: 9:10 - 10:05

Nikolay Malkin: "Deep dynamic measure transport: from amortised sampling to Schrödinger bridges and beyond"
Show Abstract

Probabilistic models that approximate a distribution using a dynamical system that transports particles from a source distribution to the target have seen rapid development and adoption in recent years. Indeed, diffusion models and continuous normalising flows show success in generative modelling for various domains, but they can also be trained to sample intractable Bayesian posterior distributions, where no samples are available but an unnormalised target density can be queried. In this talk, I will first survey past work (by me and others) on fitting stochastic dynamical systems that sample from a distribution when no target samples are available using deep reinforcement learning methods. However, the underlying dynamical system — not only its end-time marginal distribution — is an interesting object in its own right. I will thus proceed to present recent work that extends the mentioned approaches to the Schrödinger bridge problem — that is, inference of stochastic dynamics minimising a certain transport cost — between a pair of distributions without access to data. Time permitting, I will conclude with ongoing work on learning-based methods for dynamic optimal transport in a multimarginal setting, which allows inference of (deterministic) dynamics from sparse observations. Applications include sampling Boltzmann densities of molecular conformations, inverse problems and conditional generation under pretrained generative model priors, and single-cell RNA sequence modelling.

Talk: 10:05 - 11:00

Giovanni Conforti: "Solving the entropic optimal transport problem"
Show Abstract

In this talk, I will survey some recent progresses in the field of entropic optimal transport (a.k.a. Schrödinger problem). More precisely I will focus on results providing theoretical guarantees of convergence for the most popular algorithms employed to compute approximate solutions in machine learning applications, namely Sinkhorn’s algorithm and the iterated Markovian fitting algorithm. In particular, I will develop the connection between the regularity of entropic potentials, stability of optimal solutions and exponential convergence of the above mentioned algorithms in the number of iterations. This is partially based on joint work with A.Chiarini, A.Durmus, M.Gentiloni, G.Greco, L.Tamanini.

Short talk: 11:30 - 12:05

Le-Tuyet-Nhi Pham: "Discrete Markov Probabilistic Models: An Improved Discrete Score-Based Framework with sharp convergence bounds under minimal assumptions"
Show Abstract

This paper introduces the Discrete Markov Probabilistic Model (DMPM), a novel algorithm for discrete data generation. The algorithm operates in discrete space, where the noising process is a continuous-time Markov chain that can be sampled exactly via a Poissonian clock that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process, with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, strengthening its theoretical alignment with score-based generative models while ensuring robustness and efficiency. We further establish convergence bounds for the algorithm under minimal assumptions and demonstrate its effectiveness through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight its strong performance in generating discrete structures. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.

Talk: 12:05 - 13:00

Mathias Trabs: "Flow Matching as a forecasting model"
Show Abstract

Flow Matching and associated generative models have recently attracted significant interest due to their simulation-free training via a straightforward least squares criterion and its flexible underlying ordinary differential equation framework. The cheap generation of new samples opens the door to efficient distribution estimation, an essential component of forecasting tasks such as weather prediction. In this talk, we first adapt the Flow Matching method to smooth conditional density estimation. We show that the resulting estimator can be related to the classical Nadaraya-Watson estimator. Then, we bridge the gap between proper scoring rules, the established method of evaluating predictions, and the mean integrated squared risk in density estimation. Building on this, we demonstrate an anisotropic rate of convergence for the Flow Matching estimator with respect to a spectral family of proper scoring rules which includes the popular energy score. The talk is based on joint work with Lea Kunkel.

Short talk: 14:00 - 14:35

Lea Kunkel: "Distribution estimation via Flow Matching"
Show Abstract

Recently, flow matching, introduced by Lipman et. al. (2022), has caused increasing interest in generative modelling. Using a solution to an ODE leads to a generation process that is much simpler compared to diffusions, the current state of the art generative model. This idea has been further developed and several adaptations, especially regarding the choice of conditional probability paths, have been presented in the literature. Exploiting the connection to kernel density estimation, we analyze flow matching from a statistical perspective. We derive reasonable conditions for the choice of conditional probability paths and study the rate of convergence.

Short talk: 14:35 - 15:10

Elen Vardanyan: "Statistically Optimal Generative Models that Avoid Replicating Data"
Show Abstract

This work explores the problem of generative modeling, aiming to simulate diverse examples from an unknown distribution based on observed examples. While recent studies have focused on quantifying the statistical precision of popular algorithms, there is a lack of mathematical evaluation regarding the non-replication of observed examples and the creativity of the generative model. We present theoretical insights into this aspect, demonstrating that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that not only avoid replication but also significantly deviate from the empirical distribution. Importantly, we show that left-invertibility achieves this without compromising the statistical optimality of the resulting generator. Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one. We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one. Both bounds are explicit and show the impact of key parameters such as sample size, dimensions of the ambient and latent spaces, noise level, and smoothness measured by the Lipschitz constant.

Short talk: 15:40 - 16:15

Jonathan Schmidt: "Data meets Dynamics: Towards Integrating Mechanistic Knowledge with Data at Scale"
Show Abstract

In many scientific applications — from epidemiology to climate science — there is ample prior knowledge encoded in mechanistic systems that govern partially observed dynamics. This knowledge can be leveraged in an attempt to make time-series inference transcend black-box curve fitting. This talk explores two complementary approaches to effectively combining empirical and mechanistic information and uncertainty in a probabilistic-inference pipeline, and how computational challenges in high-dimensional settings can be tackled.

Talk: 16:15 - 17:10

Johannes Schmidt-Hieber: "Generative Modelling via Quantile Regression"
Show Abstract

We link conditional generative modelling to quantile regression. We propose a suitable loss function and derive minimax convergence rates for the associated risk under smoothness assumptions imposed on the conditional distribution. To establish the lower bound, we show that nonparametric regression can be seen as a sub-problem of the considered generative modelling framework. Finally, we discuss extensions of our work to generate data from multivariate distributions. This is joint work with Petr Zamolodtchikov.

Wednesday, September 24

Talk: 9:10 - 10:05

Isabel Valera: "Causal deep generative models"
Show Abstract

In this talk I will introduce causal generative models, a novel class of deep generative models that do not only accurately fit observational data but can also provide accurate estimates to interventional and counterfactual queries. Specifically I will discuss how to use deep learning architectures to design such models so that with a single generative model, we can provide estimate of a large number of causal queries with theoretical guarantees, while keeping assumptions practical. I will finally discuss the open challenges of designing such causal generative models.

Talk: 10:05 - 11:00

Robert Scheichl: "Trainspotting: Measure Transport using Tensor Decomposition"
Show Abstract

General multivariate distributions are notoriously expensive to sample from, particularly the high-dimensional posterior distributions in PDE-constrained inverse problems. In this talk, I will present a measure transport based approach for Bayesian inverse problems based on low-rank surrogates in the tensor train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry. We build upon recent developments in the field of cross approximation algorithms in linear algebra to construct a tensor train approximation to the target probability density function using a small number of function evaluations. For sufficiently smooth distributions, the storage required for accurate tensor train approximations is moderate, scaling linearly with dimension. In turn, the structure of the tensor train (TT) surrogate allows sampling by an efficient conditional distribution method since marginal distributions are computable with linear complexity in dimension. Using this generic tool enables conditional sampling, construction of optimal biasing densities or even tractable Bayesian optimal experimental design. In order to keep the arising ranks in the TT approximations manageable, we furthermore propose a ’deep’ version of the approach where the transport from reference to target distribution is approximated incrementally via intermediate bridging densities. The method is demonstrated in the context of complex PDE-constrained Bayesian inverse problems, computing expectations of functionals of the PDE solution with respect to high-dimensional posterior distributions, including rare event probabilities and optimal experimental designs.

Short talk: 11:30 - 12:05

Arthur Stéphanovitch: "Regularity of the score function in generative models"
Show Abstract

We show that the score function naturally adapts to the smoothness of the data distribution. Under minimal assumptions, we establish Lipschitz estimates that directly support convergence and stability analyses in both diffusion and ODE-based generative models. In addition, we derive higher-order regularity bounds, which simplify existing arguments for optimally approximating the score function using neural networks.

Talk: 12:05 - 13:00

Eddie Aamari: "Unified minimax rates for score-based generative models"
Show Abstract

Building upon the regularity of the score, we will discuss the convergence rates achievable through denoising score matching. Then, we will discuss the distributional stability of the overall SGM pipeline, and conclude on the minimax estimation rates for the underlying distribution.