The Intrinsic Dimension of Images and Its Impact on Learning
1. The Intrinsic Dimension of Images and Its Impact on Learning
Intrinsic dimension (ID) means the effective number of degrees of freedom needed to describe the variability of the data.
Formally: If your dataset lives in a high-dimensional ambient space (e.g., an image with pixels), but all the points lie close to a low-dimensional manifold of dimension , then is the intrinsic dimension.
So the goal is: estimate directly from samples.
1.1. Estimating Intrinsic Dimension
The paper adopts a Maximum Likelihood Estimator (MLE) for local intrinsic dimension. This method uses nearest-neighbor distances.
1.1.1. Step 1: Local density assumption
They assume that around any point , data points are locally uniformly distributed in a small ball of radius in a -dimensional manifold. Under this assumption, the distances to the nearest neighbors follow a known statistical distribution that depends only on . Compute nearest-neighbor distances
1.1.2. Step 2: Compute nearest-neighbor distances
For each data point :
- Compute its nearest neighbors in the dataset.
- Denote as the distance from to its -th nearest neighbor (sorted ascendingly). So .
They typically choose between 5 and 20 for stability.
1.1.3. Step 3: Derive the local dimension formula
Under the local uniformity assumption, the MLE for the local intrinsic dimension at point is:
Let's break that down:
- The term measures how fast the neighbor distances grow as you include more neighbors.
- If the data lie in a low-dimensional manifold, neighbor distances grow faster with ; in a higher-dimensional one, they grow slower.
- The average of those ratios (in log-space) encodes the "expansion rate" of the local neighborhood β which relates directly to the manifold's dimensionality.
1.1.4. Step 4: Global intrinsic dimension
They then average the local estimates across many (or all) points:
This gives one single number representing the datasetβs overall intrinsic dimension.
1.1.5. Step 5: Choice of
-
is a sensitivity parameter.
- If is too small noisy (high variance) MLE.
- If is too large local uniformity assumption breaks (high bias).
- In practice, they like values like (5, 10, 15, 20) and report a range.
- For example, for ImageNet, ID "approx" 26β43 depending on .
1.2. Findings
- Real-world image datasets lie on surprisingly low-dimensional manifolds β the intrinsic dimension (ID) is tiny compared to pixel count.
- Intrinsic dimension strongly correlates with how hard a task is to learn β higher ID harder learning, more samples required, weaker generalization.
- The intrinsic dimension β not the raw pixel (ambient) dimension β determines learning complexity.