Mean Flow

June 19, 2025

by Leonardo

1. MeanFlow: One-step Generative Modeling via Average Velocity

MeanFlow introduces a principled framework for one-step generative modeling by fundamentally reconsidering what velocity field to model. Instead of learning instantaneous velocities like traditional Flow Matching, MeanFlow directly models average velocities - the displacement over a time interval divided by that interval. This seemingly simple change leads to a well-defined mathematical identity that enables stable, curriculum-free training and achieves state-of-the-art one-step generation results.

1.1. The Core Insight: Average Velocity

1.1.1. Intuition

The key insight of MeanFlow is that instead of modeling the instantaneous velocity $𝑣 (𝑧_{𝑡}, 𝑡)$ that describes the tangent direction at each point, we can directly model the average velocity $𝑢 (𝑧_{𝑡}, 𝑟, 𝑡)$ that represents the total displacement from time $𝑟$ to time $𝑡$ divided by the time interval.

1.1.2. Mathematical Definition

The average velocity is defined as:

𝑢 (𝑧_{𝑡}, 𝑟, 𝑡) = \frac{1}{𝑡 - 𝑟} \int_{𝑟}^{𝑡} 𝑣 (𝑧_{𝜏}, 𝜏) 𝑑 𝜏

This represents the displacement between times $𝑟$ and $𝑡$ , divided by the time interval. Note that:

$𝑢$ is a functional of $𝑣$ : $𝑢 = ℱ [𝑣]$
As $𝑟 \to 𝑡$ : $\lim_{𝑟 \to 𝑡} 𝑢 (𝑧_{𝑡}, 𝑟, 𝑡) = 𝑣 (𝑧_{𝑡}, 𝑡)$
The field $𝑢 (𝑧_{𝑡}, 𝑟, 𝑡)$ depends on both time points $𝑟$ and $𝑡$

1.2. The MeanFlow Identity

1.2.1. Derivation

The fundamental relationship between average and instantaneous velocities emerges from differentiating the definition. Starting with:

(𝑡 - 𝑟) 𝑢 (𝑧_{𝑡}, 𝑟, 𝑡) = \int_{𝑟}^{𝑡} 𝑣 (𝑧_{𝜏}, 𝜏) 𝑑 𝜏

Differentiating both sides with respect to $𝑡$ (treating $𝑟$ as independent):

\frac{𝑑}{𝑑 𝑡} [(𝑡 - 𝑟) 𝑢 (𝑧_{𝑡}, 𝑟, 𝑡)] = \frac{𝑑}{𝑑 𝑡} \int_{𝑟}^{𝑡} 𝑣 (𝑧_{𝜏}, 𝜏) 𝑑 𝜏

Using the product rule on the left and fundamental theorem of calculus on the right:

𝑢 (𝑧_{𝑡}, 𝑟, 𝑡) + (𝑡 - 𝑟) \frac{𝑑 𝑢}{𝑑 𝑡} = 𝑣 (𝑧_{𝑡}, 𝑡)

This gives us the MeanFlow Identity:

1.2.2. Computing the Total Derivative

The total derivative $\frac{𝑑 𝑢}{𝑑 𝑡}$ expands as:

\frac{𝑑 𝑢}{𝑑 𝑡} = \frac{𝑑 𝑧_{𝑡}}{𝑑 𝑡} \partial_{𝑧} 𝑢 + \frac{𝑑 𝑟}{𝑑 𝑡} \partial_{𝑟} 𝑢 + \frac{𝑑 𝑡}{𝑑 𝑡} \partial_{𝑡} 𝑢

Since $\frac{𝑑 𝑧_{𝑡}}{𝑑 𝑡} = 𝑣 (𝑧_{𝑡}, 𝑡)$ , $\frac{𝑑 𝑟}{𝑑 𝑡} = 0$ , and $\frac{𝑑 𝑡}{𝑑 𝑡} = 1$ :

\frac{𝑑 𝑢}{𝑑 𝑡} = 𝑣 (𝑧_{𝑡}, 𝑡) \partial_{𝑧} 𝑢 + \partial_{𝑡} 𝑢

This can be efficiently computed using Jacobian-Vector Products (JVP) with tangent vector $[𝑣, 0, 1]$ .

1.3. Training Procedure

1.3.1. Loss Function

We parameterize the average velocity with a neural network $𝑢_{𝜃} (𝑧_{𝑡}, 𝑟, 𝑡)$ and encourage it to satisfy the MeanFlow Identity:

ℒ (𝜃) = 𝔼 [{‖ 𝑢_{𝜃} (𝑧_{𝑡}, 𝑟, 𝑡) - sg (𝑢^{tgt}) ‖}^{2}]

where the target is:

𝑢^{tgt} = 𝑣_{𝑡} - (𝑡 - 𝑟) (𝑣_{𝑡} \partial_{𝑧} 𝑢_{𝜃} + \partial_{𝑡} 𝑢_{𝜃})

Here:

$𝑣_{𝑡} = 𝜀 - 𝑥$ is the conditional velocity from Flow Matching
$sg (\cdot)$ denotes stop-gradient to avoid higher-order derivatives
The JVP computation $(𝑣_{𝑡} \partial_{𝑧} 𝑢_{𝜃} + \partial_{𝑡} 𝑢_{𝜃})$ is efficient

1.4. Classifier-Free Guidance Extension

1.4.1. Motivation

Standard CFG requires two function evaluations: $𝑣^{cfg} = 𝜔 𝑣 (\cdot | 𝑐) + (1 - 𝜔) 𝑣 (\cdot)$ . For one-step generation, this doubles the Number of Function Evaluations (NFE), defeating the purpose.

1.4.2. Solution: CFG in the Ground Truth

MeanFlow naturally incorporates CFG into the ground-truth field itself:

𝑣^{cfg} (𝑧_{𝑡}, 𝑡 | 𝑐) = 𝜔 𝑣 (𝑧_{𝑡}, 𝑡 | 𝑐) + (1 - 𝜔) 𝑣 (𝑧_{𝑡}, 𝑡)

The corresponding average velocity $𝑢^{cfg}$ satisfies its own MeanFlow Identity:

𝑢^{cfg} (𝑧_{𝑡}, 𝑟, 𝑡 | 𝑐) = 𝑣^{cfg} (𝑧_{𝑡}, 𝑡 | 𝑐) - (𝑡 - 𝑟) \frac{𝑑 𝑢^{cfg}}{𝑑 𝑡}

1.4.3. Training with CFG

The modified target becomes:

{\tilde{𝑣}}_{𝑡} = 𝜔 𝑣_{𝑡} + (1 - 𝜔) 𝑢_{𝜃}^{cfg} (𝑧_{𝑡}, 𝑡, 𝑡)

This allows the network to directly model the CFG-enhanced field while maintaining 1-NFE sampling.

References

Mean Flows for One-step Generative Modeling

🔒 Access Restricted

Access Control