Pre-training and Post-training of image generation model

1. The core problem: avoiding the ā€œAI lookā€

Many modern image models over-optimize for capability metrics (text rendering, attribute binding, counting, etc.) and drift into the ā€œAI lookā€: overly smooth skin, shallow depth-of-field, bland compositions, oversaturated or uniformly bright images.

Aesthetics scorers widely used for training or evaluation (e.g., LAION-Aesthetics, Pickscore, ImageReward, HPSv2), often built on low-res CLIP finetunes, struggle to rank images from today's stronger models and encode biased preferences (e.g., favoring certain subjects like female portraits, soft focus, high brightness).

Human aesthetics are subjective and multi-dimensional, so a single score is inadequate. The fix is better data curation and output alignment to maintain tasteful defaults without collapsing diversity.

2. The art of mode collapse

FigureĀ 1: Pre-training is all about mode coverage, post-training is all about mode collapsing.

Pretraining for mode coverage: maximize diversity in styles, subjects, and scenes, and deliberately keep ā€œbad samplesā€ so negative prompts actually work. This stage builds world knowledge and versatility.

Post-training for mode collapse: steer the distribution toward a target aesthetic. While pretraining sets the upper bound for diversity and structure, the perceived final quality is largely determined by post-training.

3. Some findings

  1. quality over quantity: fewer than 1 million top-tier samples were sufficient for strong post-training gains. More data helps stability and debiasing, but the decisive factor is curation quality.
  2. Take an opinionated approach: mixing heterogeneous aesthetic preferences leads to bland compromises—simple symmetric compositions, softened textures, color grading collapse—i.e., back to the ā€œAI look.ā€ Aesthetics are subjective, so alignment should intentionally ā€œbiasā€ toward a clear artistic direction to deliver better default outputs.
FigureĀ 4: Global preference will make both parties unsatisfied.

References

  1. Releasing Open Weights for FLUX.1 Krea