Research

June 19, 2025

by Leonardo

1. Research

Aim: Accelerate the inference of flow matching.
Intuition:
- The "Easy-Hard" Nature of the Generative Path: The process of transforming noise into a structured image is not uniformly difficult. Some parts of the path are "easy" and smooth (e.g., forming a blurry outline), where a simple model can predict the trajectory. Other parts are "hard" and require high precision (e.g., forming intricate details like eyes). The goal is to use a fast, draft model for the easy parts and the powerful, main model only when necessary.
- Changing the Main Model's Job from "Worker" to "Supervisor": Instead of having the large, slow model compute every single small step of the path (acting as a "worker"), we change its role. It now acts as an efficient "supervisor." It lets the fast, draft model make a long-range proposal, and then the main model only needs to perform a few cheap calculations to verify and correct this proposal at a high level.
- A "Trust, but Verify" Framework with a Safety Net: The method doesn't naively assume the small model is always accurate. The core principle is "trust, but verify." It trusts the draft model to suggest a leap forward, but it verifies this leap with the main model. Crucially, if the verification fails, the system has a safety net: it rejects the leap and defaults to taking one small, guaranteed-to-be-correct step with the main model. This ensures that we get speed when the approximation is good, without sacrificing quality when it's not.

In LLMs, Speculative Decoding is a popular technique to accelerate the inference of LLMs.

用相似方法, 残差校正与接受 (Residual Correction and Acceptance, RCA), 快但有损

他们指出, 在扩散模型中, 如果直接套用离散空间中的调整后拒绝采样 (Adjusted Rejection Sampling), 在实现上会遇到极大困难且效率低下。

他们的核心创新是: 利用了反射最大耦合 (Reflection Maximal Coupling) 这一数学工具。该方法专门用于耦合两个具有相同协方差但不同均值的高斯分布。

这完美契合了标准扩散模型的采样步骤：每一步都是从一个高斯分布转移到另一个高斯分布，且噪声方差（协方差）通常是固定的。当一个“草稿”点被拒绝时，该方法并非从一个复杂的分布中重新采样，而是通过一个确定性的反射变换，直接计算出符合目标分布的新点。是无损的。

问题: