Generative Model

June 10, 2025

by Leonardo

We want to sample from $𝑝 (𝑥)$ , but how? We usually have four ways:

Paradigm	Idea
Directly model $𝑝 (𝑥)$	Explicit density: $𝑝 (𝑥) = \frac{𝑞 (𝑥)}{𝑍}, 𝑍 = \int 𝑞 (𝑥) 𝑑 𝑥$ Or tractable forms: $𝑝 (𝑥) = \prod_{𝑖} 𝑝 (𝑥_{𝑖} \| 𝑥_{< 𝑖})$
Latent Variable $𝑝 (𝑥 \| 𝑧)$	Marginalization: $𝑝 (𝑥) = \int 𝑝 (𝑥 \| 𝑧) 𝑝 (𝑧) 𝑑 𝑧$ Variational bound: $\log 𝑝 (𝑥) \geq ELBO$
Implicit Generation $𝐺 (𝑧) \to 𝑥$	Pushforward measure: $𝑝_{𝑔} (𝑥) = {(𝐺_{𝜃})}_{#} 𝑝 (𝑧)$ No explicit density
Score-Based $\nabla_{𝑥} \log 𝑝 (𝑥)$	Score matching: Learn $𝑠_{𝜃} (𝑥) \approx \nabla_{𝑥} \log 𝑝 (𝑥)$ Reverse diffusion process

1. Continuous Generative Models

Discrete Generative Models
- Pros:
  - Efficient Inference: They can be very fast, often requiring only one pass of the transformer to generate a sequence.
- Cons:
  - Quality Issues: Discrete tokens suffer from a "quality issue" due to high data compression, which results in the loss of fine details. Reconstructed images can look significantly different up close from the original.
  - Fundamental Compression Flaw: To be manageable, a sequence of discrete tokens must compress information far more than a continuous representation, which is a fundamental limitation.
Continuous Generative Models
- Pros:
  - High-Quality Samples: They generally offer much better reconstruction than discrete models.
- Cons:
  - Speed Issues: Continuous models, particularly diffusion, have a "speed issue" because they require many iterative steps to generate a sample. This multi-pass process makes inference slow and computationally demanding.

Figure 1: The trilemma of continuous generative models

Do inference-time scaling benefit generative pre-training algorithms? Maybe.