At UC Berkeley’s Center for Targeted Machine Learning and Causal Inference, our OTMLE (Optimal Transport and Targeted Maximum Likelihood Estimation) reading group explores the intersection of optimal transport theory and TMLE, offering a fresh perspective on how TMLE fluctuations of probability measures can be understood. The group covers key topics, including history of optimal transport, Wasserstein metrics, geodesics, gradient flows, statistical estimation, and information geometry. Each session focuses on one of these themes, providing participants with a comprehensive foundation to bridge optimal transport with statistical estimation techniques in TMLE.

We invite all enthusiasts, researchers, and practitioners—regardless of affiliation with the CTML—to join our reading group sessions. Your interest and contributions are highly valued, as we believe that a diverse community fosters richer discussions and deeper understanding. Whether you’re new to the field or have extensive experience, we welcome you to be part of our collaborative exploration of optimal transport and TMLE.

To stay informed about our reading group sessions and the latest developments at the CTML, we invite you to subscribe to both our reading group’s mailing list and the CTML newsletter. Joining these mailing lists ensures you receive timely updates on meeting schedules, discussion topics, and upcoming events.

Optimal Transport

Image courtesy of Microsoft Research. Original image link. Used under Microsoft’s terms of use.

Our weekly reading materials will be drawn from the following list, though it is not exhaustive. We have carefully hand-picked these resources to offer not only a comprehensive introduction to optimal transport theories but also to emphasize aspects that are potentially useful in relation to TMLE.

Core References

Agueh, M., & Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904-924.
Ambrosio, L., Gigli, N., & Savaré, G. (2008). Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6), 355-607.
Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer.
Villani, C. (2021). Topics in optimal transportation (Vol. 58). American Mathematical Soc..

Supplementary References

Agueh, M., & Carlier, G. (2017). Vers un théorème de la limite centrale dans l’espace de Wasserstein?. Comptes Rendus. Mathématique, 355(7), 812-818.
Amari, S. I., & Nagaoka, H. (2000). Methods of information geometry (Vol. 191). American Mathematical Soc..
Benamou, J. D., & Brenier, Y. (2000). A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3), 375-393.
Blanchet, J., Kang, Y., & Murthy, K. (2019). Robust Wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3), 830-857.
Blanchet, J., Li, J., Lin, S., & Zhang, X. (2024). Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655.
Chernozhukov, V., Galichon, A., Hallin, M., & Henry, M. (2017). Monge–Kantorovich depth, quantiles, ranks and signs.
Duchi, J., & Namkoong, H. (2019). Variance-based regularization with convex objectives. Journal of Machine Learning Research, 20(68), 1-55.
Engquist, B., Froese, B. D., & Yang, Y. (2016). Optimal transport for seismic full waveform inversion. arXiv preprint arXiv:1602.01540.
Figalli, A., & Glaudo, F. (2021). An invitation to optimal transport, Wasserstein distances, and gradient flows.
Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International statistical review, 70(3), 419-435.
Jordan, R., Kinderlehrer, D., & Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1), 1-17.
Kass, R. E., & Vos, P. W. (2011). Geometrical foundations of asymptotic inference. John Wiley & Sons.
Kuhn, D., Esfahani, P. M., Nguyen, V. A., & Shafieezadeh-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics (pp. 130-166). Informs.
Lam, H. (2016). Robust sensitivity analysis for stochastic systems. Mathematics of Operations Research, 41(4), 1248-1275.
Panaretos, V. M., & Zemel, Y. (2019). Statistical aspects of Wasserstein distances. Annual review of statistics and its application, 6(1), 405-431.
Panaretos, V. M., & Zemel, Y. (2020). An invitation to statistics in Wasserstein space (p. 147). Springer Nature.
Rao, C. R. (1992). Information and the accuracy attainable in the estimation of statistical parameters. In Breakthroughs in Statistics: Foundations and basic theory (pp. 235-247). New York, NY: Springer New York.
Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63), 94.
Tsybakov, A. B. (2009). Lower bounds on the minimax risk. Introduction to Nonparametric Estimation, 77-135.
Van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data (Vol. 4). New York: Springer.
Van der Laan, M. J., & Rose, S. (2018). Targeted learning in data science. Cham: Springer International Publishing.
Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint (Vol. 48). Cambridge university press.

Fall 2024: Foundations of Optimal Transport

The Fall 2024 semester introduces participants to the foundational concepts of optimal transport, covering the three primary formulations: Monge, Kantorovich, and Benamou-Brenier formulations. These are explored alongside their respective characterizations of the Wasserstein distance and the optimal transport plans that emerge in each framework. This exploration provides a comprehensive understanding of how optimal transport establishes metrics over probability spaces and how these relate to statistical estimation and hypothesis testing.

A key focus is understanding TMLE as a dynamic path in probability space, where optimal transport provides a spatial and geometric viewpoint. Participants examine how properties of optimal transport plans—such as monotonicity, duality, geodesics, and gradient flows—inform the theoretical underpinnings of TMLE. This semester emphasizes building a strong foundation and connecting the “moving mass” perspective of optimal transport to the iterative updates in TMLE.

[Introduction]

Date: September 25th, 2024

Presenter: Kaiwen Hou

Optional Reading: Villani (2021) Sections 0.1-0.3, 2.1-2.3.1

Purpose of the reading group and its role in advancing targeted learning

Logistics: meeting times, room assignments, and reading materials for the semester

Basic concepts: source measure, target measure, transport map, and pushforward

Monge’s formulation, existence, and uniqueness

Kantorovich’s relaxation and transport plan

Property of the optimal transport map: monotonicity

Optimal transport map implied by TMLE

Unresolved Questions:

Compactness of the coupling space

Weierstrass theorem: existence in Kantorovich’s formulation

Existence of suboptimal transport plan in proving monotonicty

[Geometry of Optimal Transport]

Date: October 2nd, 2024

Presenter: Kaiwen Hou

Reading: Villani (2021) Sections 2.2-2.3.2, 1.1.1-1.1.5; Santambrogio (2015) Box 1.1, Theorem 1.4

Optional Reading: Villani (2021) Sections 4.1, 1.1.6-1.2, 2.1.1-2.1.3; Santambrogio (2015) Section 1.2

Construction of optimal transport map

Cyclical monotonicity and Rockafellar’s theorem

Monge–Ampère equation

Existence of optimal transport plan in Kantorovich’s formulation

[Wasserstein Distances]

Date: October 9th, 2024

Presenter: Qiuran Lyu

Reading: Villani (2021) Sections 7.1, 7.4, Exercise 7.11

Optional Reading: Villani (2021) Sections 7.2-7.3; Engquist, Froese & Yang (2016) Theorem 5

Wasserstein metric: nonnegativity and symmetry

Gluing lemma to prove the triangle inequality

Proof of gluing lemma

Ordering and interpolation inequalities

Topological properties: robustness to oscillations

Convexity properties and behavior under rescaled convolution

[Statistical Inference Based on Wasserstein Distances]

Date: October 16th, 2024

Presenter: Wenxin Zhang

Reading: Villani (2021) Sections 2.1.5, 5.1.3; Agueh & Carlier (2011) Sections 1-3, 6; Panaretos & Zemel (2019) Sections 2.1, 3.1

Optional Reading: Villani (2021) Sections 5.2.1-5.2.2; Santambrogio (2015) Lemma 5.29, Proposition 5.32; Agueh & Carlier (2017); Panaretos & Zemel (2020)

Properties of Wasserstein distances under shifts, scaling, and product measures

Subadditivity of Wasserstein distances w.r.t. convolutions

Wasserstein test statistics for empirical measures and/or two samples

Asymptotic distributions of Wasserstein test statistics under univariate measures

Wasserstein Fréchet mean of univariate location family: sufficient condition

Wasserstein Fréchet mean of two measures: displacement interpolation

Wasserstein Fréchet mean of Gaussian distributions is Gaussian

[Ten Metrics on Probability Measures]

Date: October 23rd, 2024

Presenter: Qiuran Lyu

Reading: Gibbs & Su (2002) Figure 1, Sections 2-3

Optional Reading: Peyré & Cuturi (2019) Sections 8.1-8.4; Tsybakov (2009) Section 2.4; Wainwright (2019) Chapter 15

Definitions

f-divergence

Metric inequalities and proof

[Monge–Kantorovich Depth]

Date: October 30th, 2024

Presenter: Yilong Hou

Reading: Villani (2021) Proposition 2.4, Theorem 2.9; Chernozhukov et al. (2017) Paragraphs “Notation, conventions and preliminaries”, “MK depth is halfspace depth in dimension 1”, Sections 2.3, 3.2-3.3, A, B3-4

Optional Reading: Duality and Double Convexification (scribed by Qiuran Lyu)

Statistical depth and Tukey halfspace depth

Monge–Kantorovich depth

Kantorovich-Brenier theorem

Empirical depth, quantiles, and ranks

Uniform convergence of empirical transport maps

[Euler Equation and Geodesics]

Date: November 6th, 2024

Presenter: Mingxun Wang

Reading: Figalli & Glaudo (2021) Sections 1.3, 2.5.4

Optional Reading: Villani (2021) Theorem 3.8, Sections 3.1-3.3; Notes

Basics of Riemannian geometry: tangent space, gradient, arc length parameterization, Riemannian distance, and geodesic

Incompressible Euler equation

Arnold’s geodesic interpretation: measure-preserving orientation-preserving diffeomorphism

Brenier’s approximate geodesics: midpoint projection onto closure

Polar factorization theorem

Helmholtz decomposition of differentiable vector fields into irrotational and solenoidal vector fields

[Benamou-Brenier Formulation (1)]

Date: November 13th, 2024

Presenter: Yi Li

Reading: Villani (2021) Sections 8.1-8.2

Optional Reading: Villani (2021) Sections 5.1, 8.3

Continuity equation: velocity field and Lagrangian specification of flow field

Benamou-Brenier formulation of Wasserstein distance: kinetic energy and action functional

Otto’s calculus and interpretation

[Variational Formulation of Fokker-Planck Equation]

Date: November 20th, 2024

Presenter: Yi Li

Reading: Jordan, Kinderlehrer & Otto (1998) Sections 1-2, 4-5

Optional Reading: Villani (2021) Sections 8.4-8.5; Ambrosio, Gigli & Savaré (2008) Definition 3.1.1

Fokker-Planck equation: unique stationary solution as the steepest descending direction

Gradient flows, JKO scheme, and minimimizing movement

L1-weak convergence of interpolated JKO process to the solution of Fokker-Planck equation

Connections between Wasserstein gradient flows and Benamou-Brenier formulation

[Continuity Equation in the Sense of Distributions (1)]

Date: November 27th, 2024

Presenter: Mingxun Wang

Reading: Ambrosio, Gigli & Savaré (2008) Section 8.1

Optional Reading: Ambrosio, Gigli & Savaré (2008) Sections 1.1, 10.0-10.1; Continuity Equation and Benamou-Brenier Formulation; Divergence Theorem; Gradient Flows; Brenier ODE

Divergence theorem

Bounded variation, rectifiable curve, geodesic, metric derivative, and arc-length reparameterization

Distribution: integration by parts, test function, local integrability

Weak derivative and Sobolev space

Continuity equation and weak solution

[Gradient Flows]

Date: December 4th, 2024

Presenter: Kaiwen Hou

Reading: Ambrosio, Gigli & Savaré (2008) Sections 8.3-8.4, 11.1

Optional Reading: Ambrosio, Gigli & Savaré (2008) Example 11.1.10, Definitions 5.1.11, 10.1.1, Theorem 8.3.1, Lemma 10.4.1

Quantum drift-diffusion equation as gradient flow of the Fisher information

Four approaches to Wasserstein gradient flows: variational approximation scheme, curves of maximal slope, pointwise differential formulation, and systems of evolution variational inequalities

Duality map: Fréchet differential of Lp norm, compatibility with norm, and compatibility with inner product

Tangent bundle and smooth cylindrical test functions

Gradient flow equation: Fréchet subdifferential in Wasserstein space and differential inclusion

Variational integral lemma: strong subdifferential is the gradient of first variation

Gradient flow example: evolutionary parabolic PDEs of diffusion type

Join us on Zoom if you can’t attend in person in 2024, and don’t forget to subscribe to this channel for access to the recordings.

Spring 2025: Geometry of Probability Space Optimization

In Spring 2025, participants further investigate the geometry of probability spaces and the implications for TMLE’s structure and behavior. Topics include deeper explorations of how optimal transport’s spatial and dynamic properties provide insights into likelihood-based optimization and its role in semiparametric models. Rather than diving into specific optimization techniques like natural gradient descent or Newton’s method, this semester focuses on laying the theoretical groundwork for understanding such methods in probability spaces. Participants refine their understanding of how probability space-based optimization differs fundamentally from traditional parameter space approaches. This exploration highlights the theoretical richness of TMLE’s operations in probability space and prepares participants to extend these ideas to advanced methods and practical implementations in their future work.

[Benamou-Brenier Formulation (2)]

Date: January 24th, 2025

Presenter: Qiuran Lyu

Computational Reading: Peyré & Cuturi (2019) Sections 7.1, 7.6, Remark 2.30

Optional Reading: Benamou & Brenier (2000)

Convex formulation using momentum

Connections with displacement interpolation

Dynamic formulation over the paths space: displacement interpolation and entropic interpolation

[Continuity Equation in the Sense of Distributions (2)]

Date: January 31st, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Theorem 8.3.1, Lemma 8.3.2, Proposition 8.3.3

Distributions in duality with smooth cylindrical test functions

Tangent vector field as the velocity field with smallest Lp norm and equal to the metric derivative

[Tangent Space (1)]

Date: February 7th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Section 8.0, Equations 0.20-0.26, Definition 8.4.1, Lemma 8.4.2, Propositions 8.4.3, 8.4.5, 8.4.6

Tangent bundle of 2-Wasserstein space

General definition of tangent bundle of Wasserstein space

Variational selection of tangent vectors

Variational characterization of divergence-free vector fields

Tangent vector to absolutely continuous curves

Optimal transport plans along absolutely continuous curves

[Tangent Space (2)]

Date: February 14th, 2025

Presenter: Mingxun Wang

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Remark 8.4.4, Theorem 8.5.1

Cotangent space and duality

Tangent space constructed from optimal maps

Optimal displacement maps are tangent

Reproducing kernel Hilbert space

[Distributionally Robust Optimization (1)]

Date: February 28th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Kuhn et al. (2019) Section 1

Nominal distributions: empirical and elliptical models

Optimizer’s curse

Wasserstein distance and dual Kantorovich problem

Kantorovich-Rubinstein theorem for W1 distance

Worst-case optimal risk based on ambiguity sets

[Distributionally Robust Optimization (2)]

Date: March 14th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Kuhn et al. (2019) Theorems 5-7

Optional Reading: Ambrosio, Gigli & Savaré (2008) Definition 3.1.1

Lipschitz regularization

Robust lower bound based on empirical perturbations

Strong duality of worst-case risk: Moreau envelope

inf-convolution: (\bar R, +, min)-algebra, basic properties, and Legendre–Fenchel transform

Moreau-Yosida regularization is the inf-convolution between objective and norm

Moreau-Yosida regularization defines gradient flows

[Distributionally Robust Optimization (3)]

Date: March 21st, 2025

Presenter: Zhongming Xie

Theoretical Reading: Blanchet et al. (2024) Section 2

Optional Reading: Duchi & Namkoong (2019) Sections 1-2.1, Lam (2016) Theorem 3.1, Blanchet, Kang & Murthy (2019) Section 2.4.1

f-divergence

Variance regularization

Optimal transport discrepancy and square-root LASSO

[Distributionally Robust Optimization (4)]

Date: April 4th, 2025

Presenter: Zhongming Xie

Theoretical Reading: Blanchet et al. (2024) Section 3

Connections between distributionally robust optimization and robust statistics

Contamination models: epsilon-contamination, full-neighborhood contamination, and adaptive contamination

Stability conditions

[Fisher Information (1)]

Date: April 11th, 2025

Presenter: Mingxun Wang

Theoretical Reading: Rao (1992)

Optional Reading: Amari & Nagaoka (2000) Section 2.2

Fisher metric

Information loss and sufficient statistics

Cramér-Rao lower bound and efficient statistics

Detour: Measure-theoretic foundations for Highly Adaptive Lasso

[Fisher Information (2)]

Date: April 18th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Kass & Vos (2011) Sections 3.2-3.4

Statistical curvature: definition and interpretation

Information loss and local sufficiency

Second-order efficiency

[Fisher Information (3)]

Date: April 25th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Amari & Nagaoka (2000) Section 2.3

0-connection is the Riemannian connection w.r.t. Fisher metric

1-connection and exponential family

(-1)-connection and mixture family

We’ll pick things up again this summer.

CTML OTMLE
Reading Group

Fridays 2-4PM PT, BWW 5101