Flow Matching

1. Flow Matching (FM) and Conditional Flow Matching (CFM)

Symbol Description Type/Dimension
𝑝0(π‘₯) Base distribution (typically simple, e.g., Gaussian) Probability density
𝑝1(π‘₯) Target data distribution Probability density
𝑝𝑑(π‘₯) Probability path at time 𝑑, connecting 𝑝0 and 𝑝1 Probability density
𝑝𝑑(π‘₯|𝑧) Conditional probability path given conditioning variable 𝑧 Conditional density
𝑧 Conditioning variable for constructing conditional flows Random variable
𝑒𝑑(π‘₯) Velocity field at time 𝑑 (marginal) ℝ𝑑
𝑒𝑑(π‘₯|𝑧) Conditional velocity field given condition 𝑧 ℝ𝑑
π‘’π‘‘πœƒ(π‘₯) Neural network velocity field with parameters πœƒ ℝ𝑑
πœ“π‘‘(π‘₯0|𝑧) Conditional flow map from initial point π‘₯0 to time 𝑑 ℝ𝑑
𝑑 Time variable, typically π‘‘βˆˆ[0,1] ℝ
π‘₯ Spatial position variable ℝ𝑑
π‘₯0 Position at initial time (𝑑=0) ℝ𝑑
π‘₯1 Position at final time (𝑑=1) ℝ𝑑
πœƒ Neural network parameters Parameter space
𝐿CFM(πœƒ) Conditional Flow Matching loss function ℝ+
𝐿FM(πœƒ) Flow Matching loss function (intractable) ℝ+

1.1. Flow Matching Overview

Flow Matching is a framework for training continuous normalizing flows by learning velocity fields that transform a simple base distribution into a target data distribution. The key insight is to parameterize the transformation through a time-dependent velocity field π‘’πœƒ(π‘₯,𝑑) that defines an ordinary differential equation.

Given a probability path 𝑝𝑑(π‘₯) that interpolates between 𝑝0 (base distribution) and 𝑝1 (data distribution), the velocity field must satisfy the continuity equation:

βˆ‚π‘π‘‘(π‘₯)βˆ‚π‘‘+βˆ‡β‹…(𝑝𝑑(π‘₯)π‘’πœƒ(π‘₯,𝑑))=0

However, directly learning π‘’πœƒ(π‘₯,𝑑) from this equation is challenging because:

  1. We don't know the true 𝑝𝑑(π‘₯) for intermediate times
  2. The continuity equation provides insufficient supervision
  3. There are infinitely many velocity fields satisfying the boundary conditions

CFM solves these issues by introducing conditional probability paths that enable tractable training.

1.2. Conditional Flow Matching (CFM)

The goal of CFM is to find a velocity field π‘’πœƒ(π‘₯,𝑑). However, there exists an infinite number of velocity fields that can satisfy the continuity equation for a given probability path. In order to get supervision for all 𝑑, one must fully specify a probability path and its corresponding velocity field.

1.2.1. How to fully specify a probability path 𝑝𝑑 and velocity field 𝑒𝑑?

The key challenge is that solving the continuity equation βˆ‚π‘‘π‘π‘‘+βˆ‡β‹…(𝑒𝑑𝑝𝑑)=0 for 𝑒𝑑 given 𝑝𝑑 has infinitely many solutions. CFM's core idea is to avoid this difficulty by constructively defining both the probability path and velocity field through:

  1. Choose a conditioning variable 𝑧
  2. Design conditional probability paths 𝑝𝑑(π‘₯|𝑧) with known flow maps πœ“π‘‘(π‘₯0|𝑧)
  3. Obtain the velocity field analytically via 𝑒𝑑(π‘₯|𝑧)=βˆ‚π‘‘πœ“π‘‘(π‘₯0|𝑧)

We want to ensure two conditions are met:

  1. The induced global probability 𝑝𝑑(π‘₯)=𝐸𝑧[𝑝𝑑(π‘₯|𝑧)] transforms 𝑝0 into 𝑝1.
  2. The associated velocity field 𝑒𝑑(π‘₯|𝑧) has an analytic form obtained from the flow construction.

1.2.2. Linear Interpolation

1.2.2.1. Conditioning on Base and Target Points

The conditional variable 𝑧 is defined as

𝑧=Β choiceΒ (π‘₯0,π‘₯1)βˆΌπ‘0×𝑝1
1.2.2.2. Flow Construction and Velocity Field

We construct a deterministic linear flow between π‘₯0 and π‘₯1:

πœ“π‘‘(π‘₯0|𝑧=(π‘₯0,π‘₯1))=Β defΒ (1βˆ’π‘‘)β‹…π‘₯0+𝑑⋅π‘₯1

This induces the probability path:

𝑝𝑑(π‘₯|𝑧=(π‘₯0,π‘₯1))=Β def 𝛿(1βˆ’π‘‘)β‹…π‘₯0+𝑑⋅π‘₯1(π‘₯)

The velocity field is obtained by differentiating the flow map:

𝑒𝑑(π‘₯|𝑧=(π‘₯0,π‘₯1))=βˆ‚π‘‘πœ“π‘‘(π‘₯0|𝑧)=π‘₯1βˆ’π‘₯0

We can verify that this velocity field satisfies the continuity equation.

1.2.3. Conical Gaussian Paths

1.2.3.1. Alternative Conditioning Choice

We can make other choices for the conditional variable:

𝑧=Β choiceΒ π‘₯1βˆΌπ‘1
1.2.3.2. Flow Construction

We construct a flow that starts from a standard Gaussian and converges to the target point:

πœ“π‘‘(π‘₯0|𝑧=π‘₯1)=Β def 𝑑π‘₯1+(1βˆ’π‘‘)π‘₯0,π‘₯0βˆΌπ‘(0,𝐼)

This induces the conditional probability path:

𝑝𝑑(π‘₯|𝑧=π‘₯1)=Β def 𝑁(𝑑π‘₯1,(1βˆ’π‘‘)2𝐼𝑑)

The corresponding velocity field is:

𝑒𝑑(π‘₯|𝑧=π‘₯1)=βˆ‚π‘‘πœ“π‘‘(π‘₯0|𝑧)=π‘₯1βˆ’π‘₯1βˆ’π‘‘

where we use the fact that π‘₯0=π‘₯βˆ’π‘‘π‘₯11βˆ’π‘‘ by inverting the flow map.

1.2.4. General construction of conditional probability paths

The general CFM construction follows these steps:

  1. First, choose a conditioning variable 𝑧 (independent of 𝑑)
  2. Second, design flow maps πœ“π‘‘(π‘₯0|𝑧) that connect the source and target distributions
  3. Third, obtain conditional probability paths 𝑝𝑑(π‘₯|𝑧) as the push-forward of the source under the flow
  4. Fourth, derive velocity fields 𝑒𝑑(π‘₯|𝑧)=βˆ‚π‘‘πœ“π‘‘(π‘₯0|𝑧) analytically

The conditional probability paths must satisfy the boundary conditions:

βˆ€π‘₯,𝐸𝑧[𝑝0(π‘₯|𝑧)]=𝑝0(π‘₯),βˆ€π‘₯,𝐸𝑧[𝑝1(π‘₯|𝑧)]=𝑝1(π‘₯)

This construction ensures that:

  1. We avoid solving the ill-posed continuity equation
  2. The velocity field has an analytical, tractable form
  3. The global probability path correctly interpolates between 𝑝0 and 𝑝1

1.2.5. From Conditional to Unconditional Velocity

FigureΒ 1: A Flow is represented with a Velocity field defining a random process generating a Probability path. The main idea of Flow Matching is to break down the construction of a complex flow satisfying the desired Boundary conditions to conditional flows satisfying simpler Boundary conditions and consequently easier to solve. The arrows indicate dependencies between different objects: Blue arrows signify relationships employed by the Flow Matching framework.

Theorem: Let 𝑧 be any random variable independent of 𝑑. Choose conditional probability paths 𝑝𝑑(π‘₯|𝑧), and let 𝑒𝑑(π‘₯|𝑧) be the velocity field associated to these paths. Then the marginal velocity field 𝑒𝑑(π‘₯) associated to the probability path 𝑝𝑑(π‘₯)=𝐸𝑧[𝑝𝑑(π‘₯|𝑧)] has a closed-form formula:

βˆ€π‘‘,π‘₯,𝑒𝑑(π‘₯)=𝐸𝑧|π‘₯[𝑒𝑑(π‘₯|𝑧)]

This is intractable in general, so we use a neural network π‘’π‘‘πœƒ(π‘₯) to estimate. The training objective is to minimize the tractable Conditional Flow Matching (CFM) loss:

𝐿CFM(πœƒ)=𝐸𝑑,𝑧,π‘₯β€–π‘’π‘‘πœƒ(π‘₯)βˆ’π‘’π‘‘(π‘₯|𝑧)β€–2

We can use the above loss because it's equivalent to directly regressing against the intractable unknown vector field 𝑒𝑑(π‘₯)1:

𝐿CFM(πœƒ)=𝐸π‘₯,π‘‘β€–π‘’π‘‘πœƒ(π‘₯)βˆ’π‘’π‘‘(π‘₯)β€–2+𝐢=𝐿 FM(πœƒ)+𝐢

2. Rectified Flow

Rectified Flow is a specific and powerful instantiation of the Flow Matching framework. It simplifies the construction of the probability path and velocity field by focusing on creating the straightest possible trajectories between points from the source and target distributions. This approach not only provides a clear and simple training objective but also leads to highly efficient generative models.

The core idea is to "rectify" the coupling between the base distribution 𝑝0(π‘₯) and the target distribution 𝑝1(π‘₯). Instead of arbitrary or complex conditional paths, Rectified Flow learns an Ordinary Differential Equation (ODE) that transports mass along straight lines.

2.1. 1-Rectified Flow: The Direct Path

The initial Rectified Flow, often called the 1-rectified flow, is constructed in a manner very similar to the linear interpolation method in CFM.

2.1.1. Construction

We start by creating the simplest possible coupling between the base and target distributions: an independent coupling. We draw a pair of samples, π‘₯0βˆΌπ‘0(π‘₯) and π‘₯1βˆΌπ‘1(π‘₯), and define a straight-line path between them.

The flow map is a direct linear interpolation:

πœ“π‘‘(π‘₯0,π‘₯1)=(1βˆ’π‘‘)π‘₯0+𝑑π‘₯1

This is the same as the "Linear Interpolation" above. The key difference in Rectified Flow is the focus on this specific construction and its iterative refinement.

The velocity field for this path is constant with respect to time for a given pair (π‘₯0,π‘₯1):

𝑒𝑑(π‘₯|𝑧=(π‘₯0,π‘₯1))=βˆ‚π‘‘πœ“π‘‘(π‘₯0,π‘₯1)=π‘₯1βˆ’π‘₯0

2.1.2. The "Rectified" Velocity Field

While individual paths are straight, the marginal velocity field 𝑒𝑑(π‘₯) at a point π‘₯ is the average velocity of all straight-line paths that pass through π‘₯ at time 𝑑. This averaging process is what "rectifies" the flow. The resulting marginal velocity field is generally non-linear and complex, and it defines a deterministic flow that transforms 𝑝0 to 𝑝1.

The training objective for a neural network π‘’π‘‘πœƒ(π‘₯) is a straightforward regression problem, identical to the CFM loss but with this specific choice of conditional velocity:

𝐿RF1(πœƒ)=𝐸𝑑,π‘₯0,π‘₯1β€–π‘’π‘‘πœƒ((1βˆ’π‘‘)π‘₯0+𝑑π‘₯1)βˆ’(π‘₯1βˆ’π‘₯0)β€–2

This loss aims to learn the expected velocity 𝐸[π‘₯1βˆ’π‘₯0|π‘₯𝑑=π‘₯] where π‘₯𝑑=(1βˆ’π‘‘)π‘₯0+𝑑π‘₯1.

The resulting trained model π‘’π‘‘πœƒ(π‘₯) can then be used to generate samples by solving the ODE 𝑑π‘₯𝑑𝑑𝑑=π‘’π‘‘πœƒ(π‘₯𝑑) from 𝑑=0 to 𝑑=1, starting with a sample π‘₯0βˆΌπ‘0(π‘₯).

2.2. 2-Rectified Flow (and beyond): The "Reflow" Procedure

A key innovation of Rectified Flow is the reflow procedure. While the 1-rectified flow is a significant step, its trajectories are only perfectly straight if the model perfectly learns the conditional expectation. In practice, the generated paths from the 1-rectified flow model will have some curvature.

The reflow procedure aims to iteratively straighten these paths.

2.2.1. The Reflow Algorithm

  1. Train the 1-Rectified Flow: First, train a velocity field 𝑒𝑑1(π‘₯) using the direct straight-line paths between 𝑝0 and 𝑝1 as described above.

  2. Generate a New Paired Dataset: Use the trained model 𝑒𝑑1(π‘₯) to generate a new set of paired samples.

    • Sample 𝑧0βˆΌπ‘0(π‘₯).
    • Solve the ODE 𝑑𝑧𝑑𝑑𝑑=𝑒𝑑1(𝑧𝑑) from 𝑑=0 to 𝑑=1 to obtain the corresponding endpoint 𝑧1.
    • This creates a new dataset of pairs (𝑧0,𝑧1) that represent a deterministic coupling induced by the 1-rectified flow.
  3. Train the 2-Rectified Flow: Train a new velocity field 𝑒𝑑2(π‘₯) using the same loss function, but on the new data pairs (𝑧0,𝑧1).

𝐿RF2(πœƒ)=𝐸𝑑,𝑧0,𝑧1β€–π‘’π‘‘πœƒ((1βˆ’π‘‘)𝑧0+𝑑𝑧1)βˆ’(𝑧1βˆ’π‘§0)β€–2

This process can be repeated to create 3-rectified flows and so on, with each iteration producing increasingly straight trajectories.

    1. Proof that 𝐿CFM(πœƒ)=𝐸π‘₯,π‘‘β€–π‘’π‘‘πœƒ(π‘₯)βˆ’π‘’π‘‘(π‘₯)β€–2+𝐢:

      We need to show that minimizing the CFM loss is equivalent to minimizing the intractable loss up to a constant. We prove this by showing the gradients are equal.

      Let 𝐷(π‘Ž,𝑏)=β€–π‘Žβˆ’π‘β€–2 be the squared L2 distance. Then:

      𝐿CFM(πœƒ)=𝐸𝑑,𝑧,π‘₯𝐷(π‘’π‘‘πœƒ(π‘₯),𝑒𝑑(π‘₯|𝑧))𝐿 FM(πœƒ)=𝐸π‘₯,𝑑𝐷(π‘’π‘‘πœƒ(π‘₯),𝑒𝑑(π‘₯))

      Taking gradients:

      βˆ‡πœƒπΏΒ FM(πœƒ)=βˆ‡πœƒπΈπ‘‘,π‘‹π‘‘βˆΌπ‘π‘‘π·(𝑒𝑑(𝑋𝑑),π‘’π‘‘πœƒ(𝑋𝑑))=𝐸𝑑,π‘‹π‘‘βˆΌπ‘π‘‘βˆ‡πœƒπ·(𝑒𝑑(𝑋𝑑),π‘’π‘‘πœƒ(𝑋𝑑))=𝐸𝑑,π‘‹π‘‘βˆΌπ‘π‘‘βˆ‡π‘£π·(𝑒𝑑(𝑋𝑑),𝑣)|𝑣=π‘’π‘‘πœƒ(𝑋𝑑)βˆ‡πœƒπ‘’π‘‘πœƒ(𝑋𝑑)=𝐸𝑑,π‘‹π‘‘βˆΌπ‘π‘‘βˆ‡π‘£π·(πΈπ‘βˆΌπ‘π‘|𝑑(β‹…|𝑋𝑑)[𝑒𝑑(𝑋𝑑|𝑍)],𝑣)|𝑣=π‘’π‘‘πœƒ(𝑋𝑑)βˆ‡πœƒπ‘’π‘‘πœƒ(𝑋𝑑)=𝐸𝑑,π‘‹π‘‘βˆΌπ‘π‘‘πΈπ‘βˆΌπ‘π‘|𝑑(β‹…|𝑋𝑑)[βˆ‡π‘£π·(𝑒𝑑(𝑋𝑑|𝑍),𝑣)|𝑣=π‘’π‘‘πœƒ(𝑋𝑑)]βˆ‡πœƒπ‘’π‘‘πœƒ(𝑋𝑑)=𝐸𝑑,π‘‹π‘‘βˆΌπ‘π‘‘πΈπ‘βˆΌπ‘π‘|𝑑(β‹…|𝑋𝑑)[βˆ‡πœƒπ·(𝑒𝑑(𝑋𝑑|𝑍),π‘’π‘‘πœƒ(𝑋𝑑))]=βˆ‡πœƒπΈπ‘‘,π‘βˆΌπ‘ž,π‘‹π‘‘βˆΌπ‘π‘‘|𝑍(β‹…|𝑍)[𝐷(𝑒𝑑(𝑋𝑑|𝑍),π‘’π‘‘πœƒ(𝑋𝑑))]=βˆ‡πœƒπΏΒ CFM(πœƒ)

References

  1. A Visual Dive into Conditional Flow Matching
  2. Flow Matching Guide and Code
  3. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow