The Intrinsic Dimension of Images and Its Impact on Learning

October 18, 2025

by Leonardo

1. The Intrinsic Dimension of Images and Its Impact on Learning

Intrinsic dimension (ID) means the effective number of degrees of freedom needed to describe the variability of the data.

Formally: If your dataset lives in a high-dimensional ambient space $𝑅^{𝐷}$ (e.g., an image with $𝐷 = 150, 528$ pixels), but all the points lie close to a low-dimensional manifold $𝑀$ of dimension $𝑑$ , then $𝑑$ is the intrinsic dimension.

So the goal is: estimate $𝑑$ directly from samples.

1.1. Estimating Intrinsic Dimension

The paper adopts a Maximum Likelihood Estimator (MLE) for local intrinsic dimension. This method uses nearest-neighbor distances.

1.1.1. Step 1: Local density assumption

They assume that around any point $𝑥_{𝑖}$ , data points are locally uniformly distributed in a small ball of radius $𝑟$ in a $𝑑$ -dimensional manifold. Under this assumption, the distances to the nearest neighbors follow a known statistical distribution that depends only on $𝑑$ . Compute nearest-neighbor distances

1.1.2. Step 2: Compute nearest-neighbor distances

For each data point $𝑥_{𝑖}$ :

Compute its $𝑘$ nearest neighbors in the dataset.
Denote $𝑇_{𝑗} (𝑥_{𝑖})$ as the distance from $𝑥_{𝑖}$ to its $𝑗$ -th nearest neighbor (sorted ascendingly). So $𝑇_{1} (𝑥_{𝑖}) \leq 𝑇_{2} (𝑥_{𝑖}) \leq \dots \leq 𝑇_{𝑘} (𝑥_{𝑖})$ .

They typically choose $𝑘$ between 5 and 20 for stability.

1.1.3. Step 3: Derive the local dimension formula

Under the local uniformity assumption, the MLE for the local intrinsic dimension at point $𝑥_{𝑖}$ is:

\hat{𝑑} (𝑥_{𝑖}) = {[\frac{1}{𝑘 - 1} \sum_{𝑗 = 1}^{𝑘 - 1} \log (\frac{𝑇_{𝑘} (𝑥_{𝑖})}{𝑇_{𝑗} (𝑥_{𝑖})})]}^{- 1}

Let's break that down:

The term $\log (\frac{𝑇_{𝑘}}{𝑇_{𝑗}})$ measures how fast the neighbor distances grow as you include more neighbors.
If the data lie in a low-dimensional manifold, neighbor distances grow faster with $𝑗$ ; in a higher-dimensional one, they grow slower.
The average of those ratios (in log-space) encodes the "expansion rate" of the local neighborhood — which relates directly to the manifold's dimensionality.

1.1.4. Step 4: Global intrinsic dimension

They then average the local estimates across many (or all) points:

{\hat{𝑑}}_{global} = \frac{1}{𝑁} \sum_{𝑖 = 1}^{𝑁} \hat{𝑑} (𝑥_{𝑖})

This gives one single number representing the dataset’s overall intrinsic dimension.

1.1.5. Step 5: Choice of $𝑘$

$𝑘$ is a sensitivity parameter.
- If $𝑘$ is too small $\to$ noisy (high variance) MLE.
- If $𝑘$ is too large $\to$ local uniformity assumption breaks (high bias).
In practice, they like $𝑘$ values like (5, 10, 15, 20) and report a range.
For example, for ImageNet, ID "approx" 26–43 depending on $𝑘$ .

1.2. Findings

Real-world image datasets lie on surprisingly low-dimensional manifolds — the intrinsic dimension (ID) is tiny compared to pixel count.
Intrinsic dimension strongly correlates with how hard a task is to learn — higher ID $\to$ harder learning, more samples required, weaker generalization.
The intrinsic dimension — not the raw pixel (ambient) dimension — determines learning complexity.

References