Leonardo's Blog
Reflections and Notes on AI, research, and life.
-
A unified explanation of the four most-confused acronyms in deep learning compute: what they mean, how to count them, and how they show up in scaling laws, model cards, and hardware spec sheets.
-
Notes on how to use Apptainer (Singularity) for containerization on HPC clusters
-
Notes on .gitignore, pre-commit, and GitHub Actions checks
-
Rclone
-
Spatial Reasoning in VLMs
-
DiT, MMDiT, DiT-Air, UViT and PRX
-
On-Policy Distillation, ULD and GOLD
-
Random Thoughts
-
Linear RNNs and Attention
-
Bagel and LightBagel
-
Kaplan et al. and Chinchilla et al. Scaling Laws
-
Survey
-
Visual Geometry Grounded Transformer
-
The Intrinsic Dimension of Images and Its Impact on Learning
-
CUDA, Triton and flash attention
-
State of AI Report 2025
-
TOEFL
-
Spatial Reasoning
-
Representation Learning for Generation (Illustration)
-
Tokenizer training and inference code
-
Fat-tree
-
ColBERT and FILIP
-
Code practice for deep learning
-
A Sober Look at the Robustness of CLIPs to Spurious Features
-
Vision encoders should be image size agnostic and task driven
-
Unified Vision-Language Models
-
Huggingface Trainer
-
Pre-training is all about mode coverage, post-training is all about mode collapsing
-
Summary of Xiangyu Zhang's talk
-
GRPO
-
Principled & Automated Interpretability in Deep Learning
-
Robustness of VLM
-
Xet
-
Distribution Shifts
-
AliTok
-
WebDataset
-
MGVQ
-
KARL
-
Image tokenization
-
Accelerate
-
FlowTok
-
VA-VAE
-
CFM
-
Download ImageNet
-
Downsampling Regularization
-
FMM
-
Single-Step Generation via Self-Consistency
-
EDM
-
DiT and SiT
-
LDM
-
REPA
-
Stochastic Interpolants
-
Language Diffusion Model
-
Use t-SNE and UMAP for Visual Analytics properly
-
Approximating Language Model Training Data from Weights
-
Mean Flow Model
-
Research
-
FM
-
Inference-time Scaling
-
Thoughts
-
CNF
-
ODE and SDE
-
CoT
-
Generative Model for Vision
-
Diffusion Models
-
VAE
-
GAN
-
Flow-based Models, Energy-based Models
-
Self-Supervised Learning
-
KL, JSD, Wasserstein Distance, Fisher Divergence