Mean Flow

1. MeanFlow: One-step Generative Modeling via Average Velocity

MeanFlow introduces a principled framework for one-step generative modeling by fundamentally reconsidering what velocity field to model. Instead of learning instantaneous velocities like traditional Flow Matching, MeanFlow directly models average velocities - the displacement over a time interval divided by that interval. This seemingly simple change leads to a well-defined mathematical identity that enables stable, curriculum-free training and achieves state-of-the-art one-step generation results.

1.1. The Core Insight: Average Velocity

1.1.1. Intuition

The key insight of MeanFlow is that instead of modeling the instantaneous velocity 𝑣(𝑧𝑡,𝑡) that describes the tangent direction at each point, we can directly model the average velocity 𝑢(𝑧𝑡,𝑟,𝑡) that represents the total displacement from time 𝑟 to time 𝑡 divided by the time interval.

1.1.2. Mathematical Definition

The average velocity is defined as:

𝑢(𝑧𝑡,𝑟,𝑡)=1𝑡𝑟𝑟𝑡𝑣(𝑧𝜏,𝜏)𝑑𝜏

This represents the displacement between times 𝑟 and 𝑡, divided by the time interval. Note that:

  • 𝑢 is a functional of 𝑣: 𝑢=[𝑣]
  • As 𝑟𝑡: lim𝑟𝑡𝑢(𝑧𝑡,𝑟,𝑡)=𝑣(𝑧𝑡,𝑡)
  • The field 𝑢(𝑧𝑡,𝑟,𝑡) depends on both time points 𝑟 and 𝑡

1.2. The MeanFlow Identity

1.2.1. Derivation

The fundamental relationship between average and instantaneous velocities emerges from differentiating the definition. Starting with:

(𝑡𝑟)𝑢(𝑧𝑡,𝑟,𝑡)=𝑟𝑡𝑣(𝑧𝜏,𝜏)𝑑𝜏

Differentiating both sides with respect to 𝑡 (treating 𝑟 as independent):

𝑑𝑑𝑡[(𝑡𝑟)𝑢(𝑧𝑡,𝑟,𝑡)]=𝑑𝑑𝑡𝑟𝑡𝑣(𝑧𝜏,𝜏)𝑑𝜏

Using the product rule on the left and fundamental theorem of calculus on the right:

𝑢(𝑧𝑡,𝑟,𝑡)+(𝑡𝑟)𝑑𝑢𝑑𝑡=𝑣(𝑧𝑡,𝑡)

This gives us the MeanFlow Identity:

1.2.2. Computing the Total Derivative

The total derivative 𝑑𝑢𝑑𝑡 expands as:

𝑑𝑢𝑑𝑡=𝑑𝑧𝑡𝑑𝑡𝑧𝑢+𝑑𝑟𝑑𝑡𝑟𝑢+𝑑𝑡𝑑𝑡𝑡𝑢

Since 𝑑𝑧𝑡𝑑𝑡=𝑣(𝑧𝑡,𝑡), 𝑑𝑟𝑑𝑡=0, and 𝑑𝑡𝑑𝑡=1:

𝑑𝑢𝑑𝑡=𝑣(𝑧𝑡,𝑡)𝑧𝑢+𝑡𝑢

This can be efficiently computed using Jacobian-Vector Products (JVP) with tangent vector [𝑣,0,1].

1.3. Training Procedure

1.3.1. Loss Function

We parameterize the average velocity with a neural network 𝑢𝜃(𝑧𝑡,𝑟,𝑡) and encourage it to satisfy the MeanFlow Identity:

(𝜃)=𝔼[𝑢𝜃(𝑧𝑡,𝑟,𝑡)sg(𝑢 tgt)2]

where the target is:

𝑢tgt =𝑣𝑡(𝑡𝑟)(𝑣𝑡𝑧𝑢𝜃+𝑡𝑢𝜃)

Here:

  • 𝑣𝑡=𝜀𝑥 is the conditional velocity from Flow Matching
  • sg() denotes stop-gradient to avoid higher-order derivatives
  • The JVP computation (𝑣𝑡𝑧𝑢𝜃+𝑡𝑢𝜃) is efficient

1.4. Classifier-Free Guidance Extension

1.4.1. Motivation

Standard CFG requires two function evaluations: 𝑣cfg =𝜔𝑣(|𝑐)+(1𝜔)𝑣(). For one-step generation, this doubles the Number of Function Evaluations (NFE), defeating the purpose.

1.4.2. Solution: CFG in the Ground Truth

MeanFlow naturally incorporates CFG into the ground-truth field itself:

𝑣cfg(𝑧𝑡,𝑡|𝑐)=𝜔𝑣(𝑧𝑡,𝑡|𝑐)+(1𝜔)𝑣(𝑧𝑡,𝑡)

The corresponding average velocity 𝑢cfg satisfies its own MeanFlow Identity:

𝑢cfg(𝑧𝑡,𝑟,𝑡|𝑐)=𝑣 cfg(𝑧𝑡,𝑡|𝑐)(𝑡𝑟)𝑑𝑢 cfg𝑑𝑡

1.4.3. Training with CFG

The modified target becomes:

𝑣̃𝑡=𝜔𝑣𝑡+(1𝜔)𝑢𝜃 cfg(𝑧𝑡,𝑡,𝑡)

This allows the network to directly model the CFG-enhanced field while maintaining 1-NFE sampling.

References

  1. Mean Flows for One-step Generative Modeling