CTML OTMLE
Reading Group

Fridays 2-4PM PT, BWW 5101

At UC Berkeley’s Center for Targeted Machine Learning and Causal Inference, our OTMLE (Optimal Transport and Targeted Maximum Likelihood Estimation) reading group explores the intersection of optimal transport theory and TMLE, offering a fresh perspective on how TMLE fluctuations of probability measures can be understood. The group covers key topics, including history of optimal transport, Wasserstein metrics, geodesics, gradient flows, statistical estimation, and information geometry. Each session focuses on one of these themes, providing participants with a comprehensive foundation to bridge optimal transport with statistical estimation techniques in TMLE.

We invite all enthusiasts, researchers, and practitioners—regardless of affiliation with the CTML—to join our reading group sessions. Your interest and contributions are highly valued, as we believe that a diverse community fosters richer discussions and deeper understanding. Whether you’re new to the field or have extensive experience, we welcome you to be part of our collaborative exploration of optimal transport and TMLE.

To stay informed about our reading group sessions and the latest developments at the CTML, we invite you to subscribe to both our reading group’s mailing list and the CTML newsletter. Joining these mailing lists ensures you receive timely updates on meeting schedules, discussion topics, and upcoming events.

Optimal Transport

Image courtesy of Microsoft Research. Original image link. Used under Microsoft’s terms of use.

Our weekly reading materials will be drawn from the following list, though it is not exhaustive. We have carefully hand-picked these resources to offer not only a comprehensive introduction to optimal transport theories but also to emphasize aspects that are potentially useful in relation to TMLE.

Core References

  • Agueh, M., & Carlier, G. (2011). Barycenters in the Wasserstein space. SIAM Journal on Mathematical Analysis, 43(2), 904-924.
  • Ambrosio, L., Gigli, N., & Savaré, G. (2008). Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
  • Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6), 355-607.
  • Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: springer.
  • Villani, C. (2021). Topics in optimal transportation (Vol. 58). American Mathematical Soc..

Supplementary References

  • Agueh, M., & Carlier, G. (2017). Vers un théorème de la limite centrale dans l’espace de Wasserstein?. Comptes Rendus. Mathématique, 355(7), 812-818.
  • Benamou, J. D., & Brenier, Y. (2000). A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 84(3), 375-393.
  • Blanchet, J., Li, J., Lin, S., & Zhang, X. (2024). Distributionally robust optimization and robust statistics. arXiv preprint arXiv:2401.14655.
  • Chernozhukov, V., Galichon, A., Hallin, M., & Henry, M. (2017). Monge–Kantorovich depth, quantiles, ranks and signs.
  • Engquist, B., Froese, B. D., & Yang, Y. (2016). Optimal transport for seismic full waveform inversion. arXiv preprint arXiv:1602.01540.
  • Figalli, A., & Glaudo, F. (2021). An invitation to optimal transport, Wasserstein distances, and gradient flows.
  • Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International statistical review, 70(3), 419-435.
  • Jordan, R., Kinderlehrer, D., & Otto, F. (1998). The variational formulation of the Fokker–Planck equation. SIAM journal on mathematical analysis, 29(1), 1-17.
  • Kuhn, D., Esfahani, P. M., Nguyen, V. A., & Shafieezadeh-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations research & management science in the age of analytics (pp. 130-166). Informs.
  • Panaretos, V. M., & Zemel, Y. (2019). Statistical aspects of Wasserstein distances. Annual review of statistics and its application, 6(1), 405-431.
  • Panaretos, V. M., & Zemel, Y. (2020). An invitation to statistics in Wasserstein space (p. 147). Springer Nature.
  • Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63), 94.
  • Tsybakov, A. B. (2009). Lower bounds on the minimax risk. Introduction to Nonparametric Estimation, 77-135.
  • Van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data (Vol. 4). New York: Springer.
  • Van der Laan, M. J., & Rose, S. (2018). Targeted learning in data science. Cham: Springer International Publishing.
  • Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint (Vol. 48). Cambridge university press.

Fall 2024: Foundations of Optimal Transport

The Fall 2024 semester introduces participants to the foundational concepts of optimal transport, covering the three primary formulations: Monge, Kantorovich, and Benamou-Brenier formulations. These are explored alongside their respective characterizations of the Wasserstein distance and the optimal transport plans that emerge in each framework. This exploration provides a comprehensive understanding of how optimal transport establishes metrics over probability spaces and how these relate to statistical estimation and hypothesis testing.

A key focus is understanding TMLE as a dynamic path in probability space, where optimal transport provides a spatial and geometric viewpoint. Participants examine how properties of optimal transport plans—such as monotonicity, duality, geodesics, and gradient flows—inform the theoretical underpinnings of TMLE. This semester emphasizes building a strong foundation and connecting the “moving mass” perspective of optimal transport to the iterative updates in TMLE.

[Introduction]

Date: September 25th, 2024

Presenter: Kaiwen Hou

Optional Reading: Villani (2021) Sections 0.1-0.3, 2.1-2.3.1

  • Purpose of the reading group and its role in advancing targeted learning
  • Logistics: meeting times, room assignments, and reading materials for the semester
  • Basic concepts: source measure, target measure, transport map, and pushforward
  • Monge’s formulation, existence, and uniqueness
  • Kantorovich’s relaxation and transport plan
  • Property of the optimal transport map: monotonicity
  • Optimal transport map implied by TMLE

Unresolved Questions:

  • Compactness of the coupling space
  • Weierstrass theorem: existence in Kantorovich’s formulation
  • Existence of suboptimal transport plan in proving monotonicty

[Geometry of Optimal Transport]

Date: October 2nd, 2024

Presenter: Kaiwen Hou

Reading: Villani (2021) Sections 2.2-2.3.2, 1.1.1-1.1.5; Santambrogio (2015) Box 1.1, Theorem 1.4

Optional Reading: Villani (2021) Sections 4.1, 1.1.6-1.2, 2.1.1-2.1.3; Santambrogio (2015) Section 1.2

  • Construction of optimal transport map
  • Cyclical monotonicity and Rockafellar’s theorem
  • Monge–Ampère equation
  • Existence of optimal transport plan in Kantorovich’s formulation

[Wasserstein Distances]

Date: October 9th, 2024

Presenter: Qiuran Lyu

Reading: Villani (2021) Sections 7.1, 7.4, Exercise 7.11

Optional Reading: Villani (2021) Sections 7.2-7.3; Engquist, Froese & Yang (2016) Theorem 5

  • Wasserstein metric: nonnegativity and symmetry
  • Gluing lemma to prove the triangle inequality
  • Proof of gluing lemma
  • Ordering and interpolation inequalities
  • Topological properties: robustness to oscillations
  • Convexity properties and behavior under rescaled convolution

[Statistical Inference Based on Wasserstein Distances]

Date: October 16th, 2024

Presenter: Wenxin Zhang

Reading: Villani (2021) Sections 2.1.5, 5.1.3; Agueh & Carlier (2011) Sections 1-3, 6; Panaretos & Zemel (2019) Sections 2.1, 3.1

Optional Reading: Villani (2021) Sections 5.2.1-5.2.2; Santambrogio (2015) Lemma 5.29, Proposition 5.32; Agueh & Carlier (2017); Panaretos & Zemel (2020)

  • Properties of Wasserstein distances under shifts, scaling, and product measures
  • Subadditivity of Wasserstein distances w.r.t. convolutions
  • Wasserstein test statistics for empirical measures and/or two samples
  • Asymptotic distributions of Wasserstein test statistics under univariate measures
  • Wasserstein Fréchet mean of univariate location family: sufficient condition
  • Wasserstein Fréchet mean of two measures: displacement interpolation
  • Wasserstein Fréchet mean of Gaussian distributions is Gaussian

[Ten Metrics on Probability Measures]

Date: October 23rd, 2024

Presenter: Qiuran Lyu

Reading: Gibbs & Su (2002) Figure 1, Sections 2-3

Optional Reading: Peyré & Cuturi (2019) Sections 8.1-8.4; Tsybakov (2009) Section 2.4; Wainwright (2019) Chapter 15

  • Definitions
  • f-divergence
  • Metric inequalities and proof

[Monge–Kantorovich Depth]

Date: October 30th, 2024

Presenter: Yilong Hou

Reading: Villani (2021) Proposition 2.4, Theorem 2.9; Chernozhukov et al. (2017) Paragraphs “Notation, conventions and preliminaries”, “MK depth is halfspace depth in dimension 1”, Sections 2.3, 3.2-3.3, A, B3-4

Optional Reading: Duality and Double Convexification (scribed by Qiuran Lyu)

[Euler Equation and Geodesics]

Date: November 6th, 2024

Presenter: Mingxun Wang

Reading: Figalli & Glaudo (2021) Sections 1.3, 2.5.4

Optional Reading: Villani (2021) Theorem 3.8, Sections 3.1-3.3; Notes

  • Basics of Riemannian geometry: tangent space, gradient, arc length parameterization, Riemannian distance, and geodesic
  • Incompressible Euler equation
  • Arnold’s geodesic interpretation: measure-preserving orientation-preserving diffeomorphism
  • Brenier’s approximate geodesics: midpoint projection onto closure
  • Polar factorization theorem
  • Helmholtz decomposition of differentiable vector fields into irrotational and solenoidal vector fields

[Benamou-Brenier Formulation (1)]

Date: November 13th, 2024

Presenter: Yi Li

Reading: Villani (2021) Sections 8.1-8.2

Optional Reading: Villani (2021) Sections 5.1, 8.3

  • Continuity equation: velocity field and Lagrangian specification of flow field
  • Benamou-Brenier formulation of Wasserstein distance: kinetic energy and action functional
  • Otto’s calculus and interpretation

[Variational Formulation of Fokker-Planck Equation]

Date: November 20th, 2024

Presenter: Yi Li

Reading: Jordan, Kinderlehrer & Otto (1998) Sections 1-2, 4-5

Optional Reading: Villani (2021) Sections 8.4-8.5; Ambrosio, Gigli & Savaré (2008) Definition 3.1.1

  • Fokker-Planck equation: unique stationary solution as the steepest descending direction
  • Gradient flows, JKO scheme, and minimimizing movement
  • L1-weak convergence of interpolated JKO process to the solution of Fokker-Planck equation
  • Connections between Wasserstein gradient flows and Benamou-Brenier formulation

[Continuity Equation in the Sense of Distributions (1)]

Date: November 27th, 2024

Presenter: Mingxun Wang

Reading: Ambrosio, Gigli & Savaré (2008) Section 8.1

Optional Reading: Ambrosio, Gigli & Savaré (2008) Sections 1.1, 10.0-10.1; Continuity Equation and Benamou-Brenier Formulation; Divergence Theorem; Gradient Flows; Brenier ODE

  • Divergence theorem
  • Bounded variation, rectifiable curve, geodesic, metric derivative, and arc-length reparameterization
  • Distribution: integration by parts, test function, local integrability
  • Weak derivative and Sobolev space
  • Continuity equation and weak solution

[Gradient Flows]

Date: December 4th, 2024

Presenter: Kaiwen Hou

Reading: Ambrosio, Gigli & Savaré (2008) Sections 8.3-8.4, 11.1

Optional Reading: Ambrosio, Gigli & Savaré (2008) Example 11.1.10, Definitions 5.1.11, 10.1.1, Theorem 8.3.1, Lemma 10.4.1

  • Quantum drift-diffusion equation as gradient flow of the Fisher information
  • Four approaches to Wasserstein gradient flows: variational approximation scheme, curves of maximal slope, pointwise differential formulation, and systems of evolution variational inequalities
  • Duality map: Fréchet differential of Lp norm, compatibility with norm, and compatibility with inner product
  • Tangent bundle and smooth cylindrical test functions
  • Gradient flow equation: Fréchet subdifferential in Wasserstein space and differential inclusion
  • Variational integral lemma: strong subdifferential is the gradient of first variation
  • Gradient flow example: evolutionary parabolic PDEs of diffusion type

Join us on Zoom if you can’t attend in person in 2024, and don’t forget to subscribe to this channel for access to the recordings.

Spring 2025: Geometry of Probability Space Optimization

In Spring 2025, participants further investigate the geometry of probability spaces and the implications for TMLE’s structure and behavior. Topics include deeper explorations of how optimal transport’s spatial and dynamic properties provide insights into likelihood-based optimization and its role in semiparametric models. Rather than diving into specific optimization techniques like natural gradient descent or Newton’s method, this semester focuses on laying the theoretical groundwork for understanding such methods in probability spaces. Participants refine their understanding of how probability space-based optimization differs fundamentally from traditional parameter space approaches. This exploration highlights the theoretical richness of TMLE’s operations in probability space and prepares participants to extend these ideas to advanced methods and practical implementations in their future work.

[Benamou-Brenier Formulation (2)]

Date: January 24th, 2025

Presenter: Qiuran Lyu

Computational Reading: Peyré & Cuturi (2019) Sections 7.1, 7.6, Remark 2.30

Optional Reading: Benamou & Brenier (2000)

  • Convex formulation using momentum
  • Connections with displacement interpolation
  • Dynamic formulation over the paths space: displacement interpolation and entropic interpolation

[Continuity Equation in the Sense of Distributions (2)]

Date: January 31st, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Theorem 8.3.1, Lemma 8.3.2, Proposition 8.3.3

  • Distributions in duality with smooth cylindrical test functions
  • Tangent vector field as the velocity field with smallest Lp norm and equal to the metric derivative

[Tangent Space (1)]

Date: February 7th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Section 8.0, Equations 0.20-0.26, Definition 8.4.1, Lemma 8.4.2, Propositions 8.4.3, 8.4.5, 8.4.6

  • Tangent bundle of 2-Wasserstein space
  • General definition of tangent bundle of Wasserstein space
  • Variational selection of tangent vectors
  • Variational characterization of divergence-free vector fields
  • Tangent vector to absolutely continuous curves
  • Optimal transport plans along absolutely continuous curves

[Tangent Space (2)]

Date: February 14th, 2025

Presenter: Mingxun Wang

Theoretical Reading: Ambrosio, Gigli & Savaré (2008) Remark 8.4.4, Theorem 8.5.1

  • Cotangent space and duality
  • Tangent space constructed from optimal maps
  • Optimal displacement maps are tangent
  • Reproducing kernel Hilbert space

[Distributionally Robust Optimization (1)]

Date: February 28th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Kuhn et al. (2019) Section 1

  • Nominal distributions: empirical and elliptical models
  • Optimizer’s curse
  • Wasserstein distance and dual Kantorovich problem
  • Kantorovich-Rubinstein theorem for W1 distance
  • Worst-case optimal risk based on ambiguity sets

[Distributionally Robust Optimization (2)]

Date: March 14th, 2025

Presenter: Kaiwen Hou

Theoretical Reading: Kuhn et al. (2019) Theorems 5-7

Optional Reading: Ambrosio, Gigli & Savaré (2008) Definition 3.1.1

  • Lipschitz regularization
  • Robust lower bound based on empirical perturbations
  • Strong duality of worst-case risk: Moreau envelope
  • inf-convolution: (\bar R, +, min)-algebra, basic properties, and Legendre–Fenchel transform
  • Moreau-Yosida regularization is the inf-convolution between objective and norm
  • Moreau-Yosida regularization defines gradient flows

[Distributionally Robust Optimization (3)]

Date: March 21st, 2025

Presenter: Zhongming Xie

Theoretical Reading: Blanchet et al. (2024) Section 2

  • f-divergence
  • Variance regularization
  • Optimal transport discrepancy and square-root LASSO

[Distributionally Robust Optimization (4)]

Date: April 4th, 2025

Presenter: Zhongming Xie

Theoretical Reading: