Leonardo's Blog

Reflections and Notes on AI, research, and life.

FLOP, FLOPs, FLOPS, IsoFLOP: A practical guide to compute accounting
May 13, 2026

A unified explanation of the four most-confused acronyms in deep learning compute: what they mean, how to count them, and how they show up in scaling laws, model cards, and hardware spec sheets.
Apptainer for HPC
Jan 16, 2026

Notes on how to use Apptainer (Singularity) for containerization on HPC clusters
Useful GitHub Tricks
Jan 16, 2026

Notes on .gitignore, pre-commit, and GitHub Actions checks
Rclone
Dec 25, 2025

Rclone
Spatial Reasoning
Nov 28, 2025

Spatial Reasoning in VLMs
Text-to-image Architecture
Nov 18, 2025

DiT, MMDiT, DiT-Air, UViT and PRX
On-Policy Distillation
Nov 11, 2025

On-Policy Distillation, ULD and GOLD
Random Thoughts 🔒
Nov 4, 2025

Random Thoughts
Linear RNNs and Attention
Oct 31, 2025

Linear RNNs and Attention
Bagel
Oct 29, 2025

Bagel and LightBagel
Scaling Laws
Oct 29, 2025

Kaplan et al. and Chinchilla et al. Scaling Laws
Reinforcement Learning for Large Reasoning Models
Oct 28, 2025

Survey
VGGT
Oct 24, 2025

Visual Geometry Grounded Transformer
The Intrinsic Dimension of Images and Its Impact on Learning
Oct 18, 2025

The Intrinsic Dimension of Images and Its Impact on Learning
A primer on GPUs
Oct 12, 2025

CUDA, Triton and flash attention
State of AI Report 2025
Oct 12, 2025

State of AI Report 2025
TOEFL 🔒
Sep 29, 2025

TOEFL
Spatial Reasoning
Sep 28, 2025

Spatial Reasoning
Representation Learning for Generation (Illustration)
Sep 24, 2025

Representation Learning for Generation (Illustration)
Tokenizer training and inference code
Sep 24, 2025

Tokenizer training and inference code
Fat-tree
Sep 8, 2025

Fat-tree
ColBERT and FILIP
Sep 2, 2025

ColBERT and FILIP
Code practice for deep learning
Sep 1, 2025

Code practice for deep learning
Spurious Features Robustness
Aug 30, 2025

A Sober Look at the Robustness of CLIPs to Spurious Features
Position Paper
Aug 25, 2025

Vision encoders should be image size agnostic and task driven
Unified Vision-Language Models
Aug 25, 2025

Unified Vision-Language Models
Huggingface Trainer
Aug 12, 2025

Huggingface Trainer
Pre-training and Post-training of image generation model
Aug 10, 2025

Pre-training is all about mode coverage, post-training is all about mode collapsing
Some Thoughts on Multimodal LLMs
Aug 8, 2025

Summary of Xiangyu Zhang's talk
GRPO
Aug 2, 2025

GRPO
Towards Trustworthy AI
Jul 30, 2025

Principled & Automated Interpretability in Deep Learning
VLM Robustness 🔒
Jul 30, 2025

Robustness of VLM
Parquet Content-Defined Chunking
Jul 25, 2025

Xet
Understanding Multimodal LLMs Under Distribution Shifts
Jul 21, 2025

Distribution Shifts
Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Jul 18, 2025

AliTok
WebDataset
Jul 18, 2025

WebDataset
Could VQ-VAE Beat VAE?
Jul 17, 2025

MGVQ
Single-pass Adaptive Image Tokenization for Minimum Program Search
Jul 15, 2025

KARL
Reconstruction vs. Generation
Jul 13, 2025

Image tokenization
Accelerate
Jul 12, 2025

Accelerate
Flowing Seamlessly Across Text and Image Tokens
Jul 4, 2025

FlowTok
Vision foundation model Aligned Variational AutoEncode
Jul 4, 2025

VA-VAE
Contrastive Flow Matching
Jun 30, 2025

CFM
Download ImageNet
Jun 30, 2025

Download ImageNet
Scale Equivariance of Diffusion Models
Jun 30, 2025

Downsampling Regularization
Flow map matching
Jun 28, 2025

FMM
Consistency Models
Jun 27, 2025

Single-Step Generation via Self-Consistency
Elucidating the Design Space of Diffusion-Based Generative Models
Jun 27, 2025

EDM
Diffusion Transformers (DiTs) and Scalable Interpolant Transformers (SiT)
Jun 25, 2025

DiT and SiT
Latent Diffusion Models
Jun 25, 2025

LDM
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Jun 25, 2025

REPA
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
Jun 25, 2025

Stochastic Interpolants
Language Diffusion Model
Jun 23, 2025

Language Diffusion Model
Stop Misusing t-SNE and UMAP for Visual Analytics
Jun 22, 2025

Use t-SNE and UMAP for Visual Analytics properly
Approximating Language Model Training Data from Weights
Jun 21, 2025

Approximating Language Model Training Data from Weights
Mean Flow
Jun 19, 2025

Mean Flow Model
Research 🔒
Jun 19, 2025

Research
Flow Matching
Jun 18, 2025

FM
Inference-time Scaling
Jun 18, 2025

Inference-time Scaling
Thoughts 🔒
Jun 18, 2025

Thoughts
Continuous Normalizing Flows
Jun 17, 2025

CNF
ODE and SDE
Jun 17, 2025

ODE and SDE
Chain of Thought
Jun 14, 2025

CoT
Generative Model
Jun 10, 2025

Generative Model for Vision
Score-Based Models
Jun 9, 2025

Diffusion Models
Latent Variable Models
Jun 8, 2025

VAE
Implicit Generative Models
Jun 7, 2025

GAN
Direct Modeling
Jun 6, 2025

Flow-based Models, Energy-based Models
Self-Supervised Learning
Jun 6, 2025

Self-Supervised Learning
KL Divergence and its variants
Jun 5, 2025

KL, JSD, Wasserstein Distance, Fisher Divergence

Access Control

Leonardo's Blog