Continuous Normalizing Flows
1. Continuous Normalizing Flows (CNF)
CNFs are a particular case of Neural ODE networks, with additional tricks to compute the likelihood in order to train them. Given a data point , we want to know .
Directly computing is intractable, so we use similar approach to Change of Variables. From the transport equation::
By following the Lagrangian perspective (tracking individual particles)1, we have:
Instantaneous Change of Variables: Let be a finite continuous random variable with probability dependent on time. Let be a differential equation describing a continuous-in-time transformation of . Assuming that is uniformly Lipschitz continuous in and continuous in , then the change in log probability also follows a differential equation,
Thus for a given data point at time , we can compute its log-likelihood by solving the following system of ODEs backwards in time:
Starting from and integrating from to , we obtain , which equivalent to .
The main benefits of continuous NF are:
- The constraints one needs to impose on are much less stringent than in the discrete case2
- Inverting the flow can be achieved by simply solving the ODE in reverse
- Computing the likelihood does not require inverting the flow, nor to compute a log determinant; only the trace of the Jacobian is required, that can be approximated using the Hutchinson trick.3
However, training a neural ODE with log-likelihood does not scale well to high-dimensional spaces, and the process tends to be unstable, likely due to numerical approximations and to the (infinite) number of possible probability paths.
-
Starting from the transport equation in Eulerian perspective:
Expanding the divergence:
Dividing by :
For the Lagrangian perspective, we consider the total derivative along a particle trajectory satisfying :
Substituting :
Using the Eulerian result above:
The last two terms cancel out, yielding the instantaneous change of variables formula.
-
Note that the function in the discrete case needs to be invertible, which is a strong constraint.
-
The Hutchinson trick estimates the trace of a matrix by averaging over random vectors with zero mean and unit variance, avoiding explicit computation of the full Jacobian.