Flow map matching

June 28, 2025

by Leonardo

1. Flow map matching (FMM)

The central object in our method is the flow map, which maps points along trajectories of solutions to an ordinary differential equation (ODE)

1.1. Stochastic interpolants and probability flows

Stochastic interpolant: $𝐼_{𝑡} = 𝛼_{𝑡} 𝑥_{0} + 𝛽_{𝑡} 𝑥_{1} + 𝛾_{𝑡} 𝑧$ , where $𝛼_{0} = 𝛽_{1} = 1, 𝛼_{1} = 𝛽_{0} = 0$ , and $𝛾_{0} = 𝛾_{1} = 0$ .

Probability flow: The probability density of $𝐼_{𝑡}$ is the solution to

{\dot{𝑥}}_{𝑡} = 𝑏_{𝑡} (𝑥_{𝑡}), 𝑥_{𝑡 = 0} = 𝑥_{0} \sim 𝜌_{0}

where $𝑏_{𝑡} (𝑥) = 𝐸 [{\dot{𝐼}}_{𝑡} | 𝐼_{𝑡} = 𝑥]$ . The drift $𝑏$ can be learned efficiently in practice by solving a square loss regression problem

𝑏 = \underset{\hat{𝑏}}{argmin} \int_{0}^{1} 𝐸 [{| {\hat{𝑏}}_{𝑡} (𝐼_{𝑡}) - {\dot{𝐼}}_{𝑡} |}^{2}] 𝑑 𝑡

1.2. Flow map: definition and characterizations

Flow map: The flow map $𝑋_{𝑠, 𝑡} : 𝑅^{𝑑} \to 𝑅^{𝑑}$ is the unique map such that

𝑋_{𝑠, 𝑡} (𝑥_{𝑠}) = 𝑥_{𝑡} for all (𝑠, 𝑡) \in {[0, 1]}^{2},

where ${(𝑥_{𝑡})}_{𝑡 \in [0, 1]}$ is any solution to the ODE.

Tangent condition: Let $𝑋_{𝑠, 𝑡}$ denote the flow map. Then

\lim_{𝑠 \to 𝑡} \partial_{𝑡} 𝑋_{𝑠, 𝑡} (𝑥) = 𝑏_{𝑡} (𝑥) \forall 𝑡 \in [0, 1], \forall 𝑥 \in 𝑅^{𝑑} .

We define $𝑣_{𝑠, 𝑡}$ as the exact remainder obtained by truncating a Taylor expansion in $𝑡 - 𝑠$ of $𝑋_{𝑠, 𝑡} (𝑥)$ at first order

𝑋_{𝑠, 𝑡} (𝑥) = 𝑥 + (𝑡 - 𝑠) 𝑣_{𝑠, 𝑡} (𝑥), 𝑣_{𝑡, 𝑡} (𝑥) = 𝑏_{𝑡} (𝑥)

Geometrically, $𝑣_{𝑠, 𝑡}$ describes the “slope” of the line drawn between $𝑥_{𝑠}$ and $𝑥_{𝑡}$ on a single ODE trajectory.

Some of its useful properties: The flow map $𝑋_{𝑠, 𝑡} (𝑥)$ is the unique solution to the Lagrangian equation

\partial_{𝑡} 𝑋_{𝑠, 𝑡} (𝑥) = 𝑏_{𝑡} (𝑋_{𝑠, 𝑡} (𝑥)), 𝑋_{𝑠, 𝑠} (𝑥) = 𝑥,

for all $(𝑠, 𝑡, 𝑥) \in {[0, 1]}^{2} \times ℝ^{𝑑}$ . In addition, it satisfies

𝑋_{𝑡, 𝜏} (𝑋_{𝑠, 𝑡} (𝑥)) = 𝑋_{𝑠, 𝜏} (𝑥)

for all $(𝑠, 𝑡, 𝜏, 𝑥) \in {[0, 1]}^{3} \times ℝ^{𝑑}$ . In particular, $𝑋_{𝑠, 𝑡} (𝑋_{𝑠, 𝑡} (𝑥)) = 𝑥$ for all $(𝑠, 𝑡, 𝑥) \in {[0, 1]}^{2} \times ℝ^{𝑑}$ , i.e., the flow map is invertible.

The flow map $𝑋_{𝑠, 𝑡}$ is the unique solution of the Eulerian equation,

\partial_{𝑡} 𝑋_{𝑠, 𝑡} (𝑥) + 𝑏_{𝑠} (𝑥) \cdot \nabla 𝑋_{𝑠, 𝑡} (𝑥) = 0, 𝑋_{𝑠, 𝑠} (𝑥) = 𝑥,

for all $(𝑠, 𝑡, 𝑥) \in {[0, 1]}^{2} \times ℝ^{𝑑}$ .

1.3. Flow map training

1.3.1. Distillation of a known velocity field

Lagrangian map distillation: Let $𝑤_{𝑠, 𝑡} \in 𝐿^{1} ({[0, 1]}^{2})$ be a weight function satisfying $𝑤_{𝑠, 𝑡} > 0$ and let $𝐼_{𝑠}$ be the stochastic interpolant. Then the flow map is the global minimizer over $\hat{𝑋}$ of the loss

ℒ_{LMD} (\hat{𝑋}) = \int_{{[0, 1]}^{2}} 𝑤_{𝑠, 𝑡} 𝐸 [{| \partial_{𝑡} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) - 𝑏_{𝑡} ({\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠})) |}^{2}] 𝑑 𝑠 𝑑 𝑡,

subject to the boundary condition that ${\hat{𝑋}}_{𝑠, 𝑠} (𝑥) = 𝑥$ for all $𝑥 \in 𝑅^{𝑑}$ and $𝑠 \in [0, 1]$ . $𝐸$ denotes an expectation over the coupling $(𝑥_{0}, 𝑥_{1}) \sim 𝜌_{0} (𝑥_{0}, 𝑥_{1})$ and $𝑧 \sim 𝑁 (0, 𝐼_{𝑑})$ .

Eulerian map distillation: The flow map is the global minimizer over $\hat{𝑋}$ of the loss

ℒ_{EMD} (\hat{𝑋}) = \int_{{[0, 1]}^{2}} 𝑤_{𝑠, 𝑡} 𝐸 [{| \partial_{𝑠} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) + 𝑏_{𝑠} (𝐼_{𝑠}) \cdot \nabla {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) |}^{2}] 𝑑 𝑠 𝑑 𝑡,

1.3.1.1. From Distillation to Direct Training: The `stopgrad` Necessity

The distillation losses $ℒ_{LMD}$ and $ℒ_{EMD}$ assume that we have access to the true, smooth drift field $𝑏_{𝑡}$ . A natural question arises: what if $𝑏_{𝑡}$ is unknown and we only have access to samples from the stochastic interpolant, including the noisy velocity ${\dot{𝐼}}_{𝑡}$ ?

A naive approach might be to simply replace the true drift $𝑏_{𝑠}$ with its single-sample, noisy estimate ${\dot{𝐼}}_{𝑠}$ in the loss function. For example, the Eulerian loss would become:

ℒ_{Naive} (\hat{𝑋}) = \int_{{[0, 1]}^{2}} 𝑤_{𝑠, 𝑡} 𝐸 [{| \partial_{𝑠} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) + {\dot{𝐼}}_{𝑠} \cdot \nabla {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) |}^{2}] 𝑑 𝑠 𝑑 𝑡,

However, this naive objective is flawed and will not converge to the correct flow map $𝑋$ .

The issue lies in the relationship $𝑏_{𝑠} (𝑥) = 𝐸 [{\dot{𝐼}}_{𝑠} | 𝐼_{𝑠} = 𝑥]$ . The term ${\dot{𝐼}}_{𝑠}$ is a random variable, while $𝑏_{𝑠} (𝐼_{𝑠})$ is its conditional mean. Due to the property $𝐸 [𝑌^{2}] = {(𝐸 [𝑌])}^{2} + Var (𝑌)$ , the naive loss implicitly contains an extra variance term:

\begin{matrix} 𝐸 [{| \partial_{𝑠} \hat{𝑋} + {\dot{𝐼}}_{𝑠} \cdot \nabla \hat{𝑋} |}^{2}] \\ = 𝐸 [{| \partial_{𝑠} \hat{𝑋} + 𝑏_{𝑠} \cdot \nabla \hat{𝑋} |}^{2}] + 𝐸 [Var ({\dot{𝐼}}_{𝑠} \cdot \nabla \hat{𝑋} | 𝐼_{𝑠})] \end{matrix}

This extra variance term acts as a penalty that depends on $\nabla \hat{𝑋}$ . To minimize the total loss, the optimizer is incentivized to find a solution $\hat{𝑋}$ with an artificially small gradient $\nabla \hat{𝑋}$ , leading to a biased and incorrect result.

To counteract this, a common technique is to use a stop-gradient operator. The operator, stopgrad(z), allows z to pass through during the forward pass but blocks gradients from flowing back through it during optimization. A corrected Eulerian loss would look like:

ℒ_{EE} = \int 𝑤_{𝑠, 𝑡} 𝐸 [{| \partial_{𝑠} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) + stopgrad ({\dot{𝐼}}_{𝑠} \cdot \nabla {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠})) |}^{2}] 𝑑 𝑠 𝑑 𝑡

By blocking the gradient from the noisy term, we can ensure that the expected gradient of the loss is zero at the true solution, making it a valid objective.

This challenge of handling noisy velocities directly is a primary motivation for developing more sophisticated objectives like Flow Map Matching (FMM), which we introduce next. FMM provides an alternative, well-posed loss function for direct training.

1.3.2. Direct training with flow map matching (FMM)

Flow map matching: The flow map is the global minimizer over $\hat{𝑋}$ of the loss

ℒ_{FMM} (\hat{𝑋}) = \int_{{[0, 1]}^{2}} 𝑤_{𝑠, 𝑡} (𝐸 [{| \partial_{𝑡} {\hat{𝑋}}_{𝑠, 𝑡} ({\hat{𝑋}}_{𝑡, 𝑠} (𝐼_{𝑡})) - {\dot{𝐼}}_{𝑡} |}^{2}] + 𝐸 [{| {\hat{𝑋}}_{𝑠, 𝑡} ({\hat{𝑋}}_{𝑡, 𝑠} (𝐼_{𝑡})) - 𝐼_{𝑡} |}^{2}]) 𝑑 𝑠 𝑑 𝑡,

1.3.3. Progressive distillation

Progressive flow map matching: Let $\hat{𝑋}$ be a two-time flow map. Given $𝐾 \in ℕ$ , let $𝑡_{𝑘} = 𝑠 + \frac{𝑘 - 1}{𝐾 - 1} (𝑡 - 𝑠)$ for $𝑘 = 1, \dots, 𝐾$ . Then the objective

ℒ_{PFMM} (\hat{𝑋}) = \int_{{[0, 1]}^{2}} 𝑤_{𝑠, 𝑡} 𝐸 [{| {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) - ({\hat{𝑋}}_{𝑡_{𝐾 - 1}, 𝑡_{𝐾}} \circ {\hat{𝑋}}_{𝑡_{𝐾 - 2}, 𝑡_{𝐾 - 1}} \circ \dots \circ {\hat{𝑋}}_{𝑡_{1}, 𝑡_{2}}) (𝐼_{𝑠}) |}^{2}] 𝑑 𝑠 𝑑 𝑡,

produces the same output in one step as the $𝐾$ -step iterated map $\hat{𝑋}$ .

1.3.4. Self-distillation

Self-distillation: The flow map $𝑋_{𝑠, 𝑡}$ is given for all $0 \leq 𝑠 \leq 𝑡 \leq 1$ by $𝑋_{𝑠, 𝑡} = 𝑥 + (𝑡 - 𝑠) 𝑣_{𝑠, 𝑡} (𝑥)$ where $𝑣_{𝑠, 𝑡} (𝑥)$ the unique minimizer over $\hat{𝑣}$ of

ℒ_{𝑆 𝐷} (\hat{𝑣}) = ℒ_{𝑏} (\hat{𝑣}) + ℒ_{𝐷} (\hat{𝑣}),

where $ℒ_{𝑏} (\hat{𝑣})$ is given by

ℒ_{𝑏} (\hat{𝑣}) = \int_{0}^{1} 𝐸_{𝑥_{0}, 𝑥_{1}} [{| {\hat{𝑣}}_{𝑡, 𝑡} (𝐼_{𝑡}) - {\dot{𝐼}}_{𝑡} |}^{2}] 𝑑 𝑡,

and where $ℒ_{𝐷} (\hat{𝑣})$ is any linear combination of the following three objectives:

(i) The Lagrangian self-distillation (LSD) objective,

ℒ_{𝐷}^{LSD} (\hat{𝑣}) = \int_{0}^{1} \int_{0}^{𝑡} 𝐸_{𝑥_{0}, 𝑥_{1}} [{| \partial_{𝑡} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) - {\hat{𝑣}}_{𝑡, 𝑡} ({\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠})) |}^{2}] 𝑑 𝑠 𝑑 𝑡,

(ii) The Eulerian self-distillation (ESD) objective,

ℒ_{𝐷}^{ESD} (\hat{𝑣}) = \int_{0}^{1} \int_{0}^{𝑡} 𝐸_{𝑥_{0}, 𝑥_{1}} [{| \partial_{𝑠} {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) + \nabla {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) {\hat{𝑣}}_{𝑡, 𝑡} (𝐼_{𝑠}) |}^{2}] 𝑑 𝑠 𝑑 𝑡;

(iii) The progressive self-distillation (PSD) objective,

ℒ_{𝐷}^{PSD} (\hat{𝑣}) = \int_{0}^{1} \int_{0}^{𝑡} \int_{𝑠}^{𝑡} 𝐸_{𝑥_{0}, 𝑥_{1}} [{| {\hat{𝑋}}_{𝑠, 𝑡} (𝐼_{𝑠}) - {\hat{𝑋}}_{𝑢, 𝑡} ({\hat{𝑋}}_{𝑠, 𝑢} (𝐼_{𝑠})) |}^{2}] 𝑑 𝑢 𝑑 𝑠 𝑑 𝑡 .

Above, ${\hat{𝑋}}_{𝑠, 𝑡} (𝑥) = 𝑥 + (𝑡 - 𝑠) {\hat{𝑣}}_{𝑠, 𝑡} (𝑥)$ and $𝐸_{𝑥_{0}, 𝑥_{1}}$ denotes an expectation over the random draw of $(𝑥_{0}, 𝑥_{1})$ .

1.3.5. Align your flow (AYF)

The first training objective aims to ensure that for a fixed $𝑠$ , the output of the flow map remains constant as we move $(𝑥_{𝑡}, 𝑡)$ along the PF-ODE.

AYF-Eulerian Map Distillation (AYF-EMD): Let $𝑓_{𝜃} (𝑥_{𝑡}, 𝑡, 𝑠)$ be the flow map. Consider the loss function defined between two adjacent starting time steps $𝑡$ and $𝑡^{'} = 𝑡 + 𝜀 (𝑠 - 𝑡)$ for a small $𝜀 > 0$ ,

𝐸_{𝑥_{𝑡}, 𝑡, 𝑠} [𝑤 (𝑡, 𝑠) {‖ 𝑓_{𝜃} (𝑥_{𝑡}, 𝑡, 𝑠) - 𝑓_{𝜃^{-}} (𝑥_{𝑡^{'}}, 𝑡^{'}, 𝑠) ‖}_{2}^{2}],

where $𝑥_{𝑡^{'}}$ is obtained by applying a 1-step Euler solver to the PF-ODE from $𝑡$ to $𝑡^{'}$ . In the limit as $𝜀 \to 0$ , the gradient of this loss function with respect to $𝜃$ gives

\nabla_{𝜃} 𝐸_{𝑥_{𝑡}, 𝑡, 𝑠} [𝑤^{'} (𝑡, 𝑠) sign (𝑡 - 𝑠) \cdot 𝑓_{𝜃}^{𝑇} (𝑥_{𝑡}, 𝑡, 𝑠) \cdot \frac{𝑑 𝑓_{𝜃^{-}} (𝑥_{𝑡}, 𝑡, 𝑠)}{𝑑 𝑡}]

where $𝑤^{'} (𝑡, 𝑠) = 𝑤 (𝑡, 𝑠) \times | 𝑡 - 𝑠 |$ . The AYF-EMD loss naturally generalizes the loss used to train continuous-time consistency models, as it reduces to the same objective when $𝑠 = 0$ .

The second approach ensures consistency at timestep $𝑠$ instead. This method tries to ensure that for a fixed $(𝑥_{𝑡}, 𝑡)$ , the trajectory $𝑓_{𝜃} (𝑥_{𝑡}, 𝑡, \cdot)$ is aligned with that points' PF-ODE.

AYF-Lagrangian Map Distillation (AYF-LMD): Let $𝑓_{𝜃} (𝑥_{𝑡}, 𝑡, 𝑠)$ be the flow map. Consider the loss function defined between two adjacent ending timesteps $𝑠$ and $𝑠^{'} = 𝑠 + 𝜀 (𝑡 - 𝑠)$ for a small $𝜀 > 0$ ,

𝐸_{𝑥_{𝑡}, 𝑡, 𝑠} [𝑤 (𝑡, 𝑠) {‖ 𝑓_{𝜃} (𝑥_{𝑡}, 𝑡, 𝑠) - {ODE}_{𝑠^{'} \to 𝑠} (𝑓_{𝜃^{-}} (𝑥_{𝑡}, 𝑡, 𝑠^{'})) ‖}_{2}^{2}],

where ${ODE}_{𝑡 \to 𝑠} (𝑥)$ refers to running a 1-step Euler solver on the PF-ODE starting from $𝑥$ at timestep $𝑡$ to timestep $𝑠$ . In the limit as $𝜀 \to 0$ , the gradient of this objective with respect to $𝜃$ converges to:

\nabla_{𝜃} 𝐸_{𝑥_{𝑡}, 𝑡, 𝑠} [𝑤^{'} (𝑡, 𝑠) sign (𝑠 - 𝑡) \cdot 𝑓_{𝜃}^{𝑇} (𝑥_{𝑡}, 𝑡, 𝑠) \cdot (\frac{𝑑 𝑓_{𝜃^{-}} (𝑥_{𝑡}, 𝑡, 𝑠)}{𝑑 𝑠} - 𝑣_{𝜑} (𝑓_{𝜃^{-}} (𝑥_{𝑡}, 𝑡, 𝑠), 𝑠))],

where $𝑤^{'} (𝑡, 𝑠) = 𝑤 (𝑡, 𝑠) \times | 𝑡 - 𝑠 |$ .

🔒 Access Restricted

Access Control