Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

June 25, 2025

by Leonardo

1. Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Notation: We denote probability density functions as $𝜌_{0} (𝑥)$ , $𝜌_{1} (𝑥)$ , and $𝜌 (𝑡, 𝑥)$ , with $𝑡 \in [0, 1]$ and $𝑥 \in 𝑅^{𝑑}$ , omitting the function arguments when clear from the context. $𝐶^{1} ([0, 1])$ is the space of continuously differentiable functions from $[0, 1]$ to $𝑅$ , ${(𝐶^{2} (𝑅^{𝑑}))}^{𝑑}$ is the space of twice continuously differentiable functions from $𝑅^{𝑑}$ to $𝑅^{𝑑}$ , and $𝐶_{0}^{𝑝} (𝑅^{𝑑})$ is the space of compactly supported functions from $𝑅^{𝑑}$ to $𝑅$ that are continuously differentiable $𝑝$ times.

1.1. Stochastic Interpolants

Stochastic interpolant: Given two probability density functions $𝜌_{0}, 𝜌_{1} : 𝑅^{𝑑} \to 𝑅_{\geq 0}$ , a stochastic interpolant between $𝜌_{0}$ and $𝜌_{1}$ is a stochastic process $𝑥_{𝑡}$ defined as

𝑥_{𝑡} = 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + 𝛾 (𝑡) 𝑧, 𝑡 \in [0, 1],

where

$𝐼 \in 𝐶^{2} ([0, 1], {(𝐶^{2} (𝑅^{𝑑} \times 𝑅^{𝑑}))}^{𝑑})$ satisfies the boundary conditions $𝐼 (0, 𝑥_{0}, 𝑥_{1}) = 𝑥_{0}$ and $𝐼 (1, 𝑥_{0}, 𝑥_{1}) = 𝑥_{1}$ , as well as

$\exists 𝐶_{1} < \infty : | \partial_{𝑡} 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) | \leq 𝐶_{1} | 𝑥_{0} - 𝑥_{1} | \forall (𝑡, 𝑥_{0}, 𝑥_{1}) \in [0, 1] \times 𝑅^{𝑑} \times 𝑅^{𝑑}$

We can think of $𝐼$ as a planned path from $𝑥_{0}$ to $𝑥_{1}$ that is smooth. This states that $𝐼$ does not move too fast along the way from $𝑥_{0}$ at $𝑡 = 0$ to $𝑥_{1}$ at $𝑡 = 1$ , and as a result does not wander too far from either endpoint - this assumption is made for convenience but is not necessary for most arguments below.
$𝛾 : [0, 1] \to 𝑅$ satisfies $𝛾 (0) = 𝛾 (1) = 0, 𝛾 (𝑡) > 0$ for all $𝑡 \in (0, 1)$ , and $𝛾^{2} \in 𝐶^{2} ([0, 1])$
The pair $(𝑥_{0}, 𝑥_{1})$ is drawn from a probability measure $𝜈$ that marginalizes on $𝜌_{0}$ and $𝜌_{1}$ , i.e. $𝜈 (𝑑 𝑥_{0}, 𝑅^{𝑑}) = 𝜌_{0} (𝑥_{0}) 𝑑 𝑥_{0}$ , $𝜈 (𝑅^{𝑑}, 𝑑 𝑥_{1}) = 𝜌_{1} (𝑥_{1}) 𝑑 𝑥_{1}$ . The measure $𝜈$ allows for a coupling between the two densities $𝜌_{0}$ and $𝜌_{1}$ , which affects the properties of the stochastic interpolant, but a simple choice is to take the product measure $𝜈 (𝑑 𝑥_{0}, 𝑑 𝑥_{1}) = 𝜌_{0} (𝑥_{0}) 𝜌_{1} (𝑥_{1}) 𝑑 𝑥_{0} 𝑑 𝑥_{1}$ , in which case $𝑥_{0}$ and $𝑥_{1}$ are independent.
$𝑧$ is a Gaussian random variable independent of $(𝑥_{0}, 𝑥_{1})$ , i.e. $𝑧 \sim 𝑁 (0, 𝐼 𝑑)$ and $𝑧 ⟂ (𝑥_{0}, 𝑥_{1})$

Given the above definition, we want to characterize the properties of the time dependent probability distribution $𝜇 (𝑡, 𝑑 𝑥)$ ¹ such that

\forall 𝑡 \in [0, 1] : \int_{𝑅^{𝑑}} 𝜑 (𝑥) 𝜇 (𝑡, 𝑑 𝑥) = 𝐸 [𝜑 (𝑥_{𝑡})] for any test function 𝜑 \in 𝐶_{𝑐}^{\infty} (𝑅^{𝑑})

and we have the following property:

\int_{𝑅^{𝑑}} 𝐸 [𝑓 (𝑡, 𝑥_{0}, 𝑥_{1}, 𝑧) | 𝑥_{𝑡} = 𝑥] 𝜇 (𝑡, 𝑑 𝑥) = 𝐸 [𝑓 (𝑡, 𝑥_{0}, 𝑥_{1}, 𝑧)]

1.2. Stochastic Interpolant Properties

The most important property of the probability distribution of the stochastic interpolant $𝑥_{𝑡}$ is:

Stochastic interpolant properties: The probability distribution of the stochastic interpolant $𝑥_{𝑡}$ is absolutely continuous with respect to the Lebesgue measure at all times $𝑡 \in [0, 1]$ and its time-dependent density $𝜌 (𝑡)$ satisfies $𝜌 (0) = 𝜌_{0}$ and $𝜌 (1) = 𝜌_{1}$ , $𝜌 \in 𝐶^{1} ([0, 1]; 𝐶^{𝑝} (𝑅^{𝑑}))$ for any $𝑝 \in 𝑁$ , and $𝜌 (𝑡, 𝑥) > 0$ for all $(𝑡, 𝑥) \in [0, 1] \times 𝑅^{𝑑}$ . In addition, $𝜌$ solves the transport equation (TE)

\partial_{𝑡} 𝜌 + \nabla \cdot (𝑏 𝜌) = 0,

where we defined the velocity

𝑏 (𝑡, 𝑥) = 𝐸 [{\dot{𝑥}}_{𝑡} | 𝑥_{𝑡} = 𝑥] = 𝐸 [\partial_{𝑡} 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + \dot{𝛾} (𝑡) 𝑧 | 𝑥_{𝑡} = 𝑥] .

This velocity is in $𝐶^{0} ([0, 1]; {(𝐶^{𝑝} (𝑅^{𝑑}))}^{𝑑})$ for any $𝑝 \in 𝑁$ , and such that

\forall 𝑡 \in [0, 1] : \int_{𝑅^{𝑑}} {| 𝑏 (𝑡, 𝑥) |}^{2} 𝜌 (𝑡, 𝑥) 𝑑 𝑥 < \infty .

For flow-based models (Objective), the objective is

ℒ_{𝑏} [\hat{𝑏}] = \int_{0}^{1} 𝔼 (\frac{1}{2} {| \hat{𝑏} (𝑡, 𝑥_{𝑡}) |}^{2} - (\partial_{𝑡} 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + \dot{𝛾} (𝑡) 𝑧) \cdot \hat{𝑏} (𝑡, 𝑥_{𝑡})) 𝑑 𝑡

For score-based/diffusion models (Score), the score is given by

$𝑠 (𝑡, 𝑥) = \nabla \log 𝜌 (𝑡, 𝑥) = - 𝛾^{- 1} (𝑡) 𝐸 (𝑧 | 𝑥_{𝑡} = 𝑥) \forall (𝑡, 𝑥) \in (0, 1) \times 𝑅^{𝑑}$

and the objective is

$ℒ_{𝑠} [\hat{𝑠}] = \int_{0}^{1} 𝔼 (\frac{1}{2} {| \hat{𝑠} (𝑡, 𝑥_{𝑡}) |}^{2} + 𝛾^{- 1} (𝑡) 𝑧 \cdot \hat{𝑠} (𝑡, 𝑥_{𝑡})) 𝑑 𝑡$
For energy-based models (Energy), if we model $\hat{𝑠} (𝑡, 𝑥) = - \nabla \hat{𝐸} (𝑡, 𝑥)$ ,

$ℒ_{𝐸} [\hat{𝐸}] = \int_{0}^{1} 𝔼 (\frac{1}{2} {| \hat{𝐸} (𝑡, 𝑥_{𝑡}) |}^{2} + 𝛾^{- 1} (𝑡) 𝑧 \cdot \hat{𝐸} (𝑡, 𝑥_{𝑡})) 𝑑 𝑡$

Having access to the score immediately allows us to rewrite the TE as forward and backward Fokker-Planck equations, which we state as:

Fokker-Planck equations (FPE): For any $𝜀 \in 𝐶^{0} ([0, 1])$ with $𝜀 (𝑡) \geq 0$ for all $𝑡 \in [0, 1]$ , the probability density $𝜌$ satisfies:

The forward Fokker-Planck equation

$\partial_{𝑡} 𝜌 + \nabla \cdot (𝑏_{𝐹} 𝜌) = 𝜀 (𝑡) Δ 𝜌, 𝜌 (0) = 𝜌_{0},$

where we defined the forward drift

$𝑏_{𝐹} (𝑡, 𝑥) = 𝑏 (𝑡, 𝑥) + 𝜀 (𝑡) 𝑠 (𝑡, 𝑥) .$

The forward Fokker-Planck equation is well-posed when solved forward in time from $𝑡 = 0$ to $𝑡 = 1$ , and its solution for the initial condition $𝜌 (𝑡 = 0) = 𝜌_{0}$ satisfies $𝜌 (𝑡 = 1) = 𝜌_{1}$ .
The backward Fokker-Planck equation

$\partial_{𝑡} 𝜌 + \nabla \cdot (𝑏_{𝐵} 𝜌) = - 𝜀 (𝑡) Δ 𝜌, 𝜌 (1) = 𝜌_{1},$

where we defined the backward drift

$𝑏_{𝐵} (𝑡, 𝑥) = 𝑏 (𝑡, 𝑥) - 𝜀 (𝑡) 𝑠 (𝑡, 𝑥) .$

The backward Fokker-Planck equation is well-posed when solved backward in time from $𝑡 = 1$ to $𝑡 = 0$ , and its solution for the final condition $𝜌 (1) = 𝜌_{1}$ satisfies $𝜌 (0) = 𝜌_{0}$ .

We design generative models using the stochastic processes associated with the TE, the forward FPE, and the backward FPE:

At any time $𝑡 \in [0, 1]$ , the law of the stochastic interpolant $𝑥_{𝑡}$ coincides with the law of the three processes $𝑋_{𝑡}$ , $𝑋_{𝑡}^{𝐹}$ , and $𝑋_{𝑡}^{𝐵}$ , respectively defined as:

The solutions of the probability flow associated with the transport equation

$\frac{𝑑}{𝑑 𝑡} 𝑋_{𝑡} = 𝑏 (𝑡, 𝑋_{𝑡}),$

solved either forward in time from the initial data $𝑋_{𝑡 = 0} \sim 𝜌_{0}$ or backward in time from the final data $𝑋_{𝑡 = 1} = 𝑥_{1} \sim 𝜌_{1}$ .
The solutions of the forward SDE associated with the FPE

$𝑑 𝑋_{𝑡}^{𝐹} = 𝑏_{𝐹} (𝑡, 𝑋_{𝑡}^{𝐹}) 𝑑 𝑡 + \sqrt{2 𝜀 (𝑡)} 𝑑 𝑊_{𝑡},$

solved forward in time from the initial data $𝑋_{𝑡 = 0}^{𝐹} \sim 𝜌_{0}$ independent of $𝑊$ .
The solutions of the backward SDE associated with the backward FPE

$𝑑 𝑋_{𝑡}^{𝐵} = 𝑏_{𝐵} (𝑡, 𝑋_{𝑡}^{𝐵}) 𝑑 𝑡 + \sqrt{2 𝜀 (𝑡)} 𝑑 𝑊_{𝑡}^{𝐵}, 𝑊_{𝑡}^{𝐵} = - 𝑊_{1 - 𝑡},$

solved backward in time from the final data $𝑋_{𝑡 = 1}^{𝐵} \sim 𝜌_{1}$ independent of $𝑊^{𝐵}$ ; the solution is by definition $𝑋_{𝑡}^{𝐵} = 𝑍_{1 - 𝑡}^{𝐹}$ where $𝑍_{𝑡}^{𝐹}$ satisfies

$𝑑 𝑍_{𝑡}^{𝐹} = - 𝑏_{𝐵} (1 - 𝑡, 𝑍_{𝑡}^{𝐹}) 𝑑 𝑡 + \sqrt{2 𝜀 (𝑡)} 𝑑 𝑊_{𝑡},$

solved forward in time from the initial data $𝑍_{𝑡 = 0}^{𝐹} \sim 𝜌_{1}$ independent of $𝑊$ .²

1.3. Instantiation

We connect the diffusion bridge perspective to the stochastic interpolant perspective by setting $𝑥_{𝑡}^{𝑑} = 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + \sqrt{2 𝑎 (𝑡)} 𝐵_{𝑡}$ , where $𝐵_{𝑡}$ is a standard Brownian bridge process³, independent of $𝑥_{0}$ and $𝑥_{1}$ . With some deduction we can know that $𝑎 = 𝜀$ and $𝛾 (𝑡) = \sqrt{2 𝑎 (𝑡) 𝑡 (1 - 𝑡)}$ , i.e.

𝑥_{𝑡} = 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + \sqrt{2 𝑎 (𝑡) 𝑡 (1 - 𝑡)} 𝑧,

Using Itô calculus and we can get the drift $𝑢 (𝑡, 𝑥)$ , but this requires many tedious calculations.

Given the stochastic interpolant perspective, we can write out

\begin{matrix} 𝑏 (𝑡, 𝑥) = 𝐸 (\partial_{𝑡} 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) + \frac{𝑎 (1 - 2 𝑡) 𝑧}{\sqrt{2 𝑡 (1 - 𝑡)}} | 𝑥_{𝑡} = 𝑥), \\ 𝑠 (𝑡, 𝑥) = \nabla \log 𝜌 (𝑡, 𝑥) = - \frac{1}{\sqrt{2 𝑎 𝑡 (1 - 𝑡)}} 𝐸 (𝑧 | 𝑥_{𝑡} = 𝑥) . \end{matrix}

And using $𝑢 = 𝑏 + 𝑎 𝑠$ , we have

𝑢 (𝑡, 𝑥) = 𝐸 (\partial_{𝑡} 𝐼 (𝑡, 𝑥_{0}, 𝑥_{1}) - \frac{\sqrt{2 𝑎 𝑡} 𝑧}{\sqrt{1 - 𝑡}} | 𝑥_{𝑡} = 𝑥) .

We use probability measure $𝜇 (𝑡, 𝑑 𝑥)$ instead of density function $𝜌 (𝑡, 𝑥)$ because the latter is not well defined when there's no smooth density function (Like the Dirac delta function). But in most cases, you can think of $𝜇 (𝑡, 𝑑 𝑥)$ as $𝜌 (𝑡, 𝑥) 𝑑 𝑥$ .
To avoid repeated applications of the transformation $𝑡 \to 1 - 𝑡$ , it is convenient to directly use the reversed Itô calculus rules stated in the following lemma:

Reverse Itô Calculus: If $𝑋_{𝑡}^{𝐵}$ solves the backward SDE:
1. For any $𝑓 \in 𝐶^{1} ([0, 1]; 𝐶_{0}^{2} (𝑅^{𝑑}))$ and $𝑡 \in [0, 1]$ , the backward Itô formula holds
  
  $𝑑 𝑓 (𝑡, 𝑋_{𝑡}^{𝐵}) = \partial_{𝑡} 𝑓 (𝑡, 𝑋_{𝑡}^{𝐵}) 𝑑 𝑡 + \nabla 𝑓 (𝑋_{𝑡}^{𝐵}) \cdot 𝑑 𝑋_{𝑡}^{𝐵} - 𝜀 (𝑡) Δ 𝑓 (𝑡, 𝑋_{𝑡}^{𝐵}) 𝑑 𝑡 .$
2. For any $𝑔 \in 𝐶^{0} ([0, 1]; {(𝐶_{0} (𝑅^{𝑑}))}^{𝑑})$ and $𝑡 \in [0, 1]$ , the backward Itô isometries hold:
  
  $𝐸_{𝐵}^{𝑥} [\int_{𝑡}^{1} 𝑔 (𝑡, 𝑋_{𝑡}^{𝐵}) \cdot 𝑑 𝑊_{𝑡}^{𝐵}] = 0; 𝐸_{𝐵}^{𝑥} [{| \int_{𝑡}^{1} 𝑔 (𝑡, 𝑋_{𝑡}^{𝐵}) \cdot 𝑑 𝑊_{𝑡}^{𝐵} |}^{2}] = \int_{𝑡}^{1} 𝐸_{𝐵}^{𝑥} [{| 𝑔 (𝑡, 𝑋_{𝑡}^{𝐵}) |}^{2}] 𝑑 𝑡,$
  
  where $𝐸_{𝐵}^{𝑥}$ denotes expectation conditioned on the event $𝑋_{𝑡 = 1}^{𝐵} = 𝑥$ .
A Brownian Bridge is a stochastic process that describes a random path, similar to a standard Brownian motion, but with the crucial constraint that it is "pinned" to a specific value (usually zero) at both its start and end times, which can be written as $𝐵_{𝑡} = 𝑊_{𝑡} - 𝑡 𝑊_{1}$ . Consequently, its randomness, or variance, is zero at the beginning and end, and reaches its maximum in the middle of the time interval.

References