Mean Flow
1. MeanFlow: One-step Generative Modeling via Average Velocity
MeanFlow introduces a principled framework for one-step generative modeling by fundamentally reconsidering what velocity field to model. Instead of learning instantaneous velocities like traditional Flow Matching, MeanFlow directly models average velocities - the displacement over a time interval divided by that interval. This seemingly simple change leads to a well-defined mathematical identity that enables stable, curriculum-free training and achieves state-of-the-art one-step generation results.
1.1. The Core Insight: Average Velocity
1.1.1. Intuition
The key insight of MeanFlow is that instead of modeling the instantaneous velocity that describes the tangent direction at each point, we can directly model the average velocity that represents the total displacement from time to time divided by the time interval.
1.1.2. Mathematical Definition
The average velocity is defined as:
This represents the displacement between times and , divided by the time interval. Note that:
- is a functional of :
- As :
- The field depends on both time points and
1.2. The MeanFlow Identity
1.2.1. Derivation
The fundamental relationship between average and instantaneous velocities emerges from differentiating the definition. Starting with:
Differentiating both sides with respect to (treating as independent):
Using the product rule on the left and fundamental theorem of calculus on the right:
This gives us the MeanFlow Identity:
1.2.2. Computing the Total Derivative
The total derivative expands as:
Since , , and :
This can be efficiently computed using Jacobian-Vector Products (JVP) with tangent vector .
1.3. Training Procedure
1.3.1. Loss Function
We parameterize the average velocity with a neural network and encourage it to satisfy the MeanFlow Identity:
where the target is:
Here:
- is the conditional velocity from Flow Matching
- denotes stop-gradient to avoid higher-order derivatives
- The JVP computation is efficient
1.4. Classifier-Free Guidance Extension
1.4.1. Motivation
Standard CFG requires two function evaluations: . For one-step generation, this doubles the Number of Function Evaluations (NFE), defeating the purpose.
1.4.2. Solution: CFG in the Ground Truth
MeanFlow naturally incorporates CFG into the ground-truth field itself:
The corresponding average velocity satisfies its own MeanFlow Identity:
1.4.3. Training with CFG
The modified target becomes:
This allows the network to directly model the CFG-enhanced field while maintaining 1-NFE sampling.