Bagel

1. Scalable Generative Cognitive Model (BAGEL)

BAGEL adopts a Mixture-of-Transformer-Experts (MoT) architecture comprising two transformer experts—one dedicated to multimodal understanding and the other to multimodal generation.

Figure 2: Causal mask in BAGEL during training.

2. LightBagel

Figure 3: Illustration of Shallow Fusion and Deep Fusion.

References

  1. Emerging Properties in Unified Multimodal Pretraining
  2. LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation