ODE and SDE

1. Ordinary Differential Equation (ODE)

An Ordinary Differential Equation (ODE) is a mathematical equation that relates a function with its derivatives. In its simplest form, an ODE can be written as:

𝑑𝑓(𝑑)𝑑𝑑=𝑒(𝑓(𝑑),𝑑)

where 𝑓(𝑑) is the unknown function we want to find, and 𝑒 is a known function that describes how 𝑓 changes with respect to time 𝑑. The solution to an ODE is a function that satisfies this equation for all values of 𝑑 in some interval. If we know the initial state 𝑓(0), this problem becomes an initial value problem (IVP).

1.1. Neural ODE networks

Let's start with a brief introduction to Neural ODE networks. These networks represent a significant advancement in deep learning by modeling continuous-time dynamics using ordinary differential equations. The key insight is that many deep learning architectures can be viewed as discretized versions of continuous transformations.

Traditional models like residual networks, recurrent neural network decoders, and normalizing flows build complex transformations by composing a sequence of transformations to a hidden state:

β„Žπ‘‘+1=β„Žπ‘‘+𝑓(β„Žπ‘‘,π‘‘πœƒ)

These iterative updates can be seen as an Euler discretization of a continuous transformation.

What happens as we add more layers and take smaller steps? In the limit, we parameterize the continuous dynamics of hidden units using an ODE specified by a neural network:

π‘‘β„Ž(𝑑)𝑑𝑑=𝑒(β„Ž(𝑑),𝑑,πœƒ)

The neural network 𝑒 acts as a vector field that determines how the hidden state evolves over time.

FigureΒ 1: Left: A Residual network defines a discrete sequence of finite transformations. Right: A ODE network defines a vector field, which continuously transforms the state. Both: Circles represent evaluation locations.

1.2. Velocity Field, Flow and Probability Path

Let's understand these three key concepts in the context of continuous transformations:

  1. Velocity Field 𝑒(π‘₯,𝑑): This is a vector field that specifies the direction and speed of movement at each point in space and time. It tells us how points are moving through space. In the context of neural networks, this is often parameterized by a neural network that takes the current state and time as input.

  2. Flow 𝑓(π‘₯,𝑑): The flow represents the actual trajectory or path that points follow under the influence of the velocity field. It's the solution to the ODE defined by the velocity field. Given an initial point π‘₯0, the flow tells us where that point will be at any time 𝑑.

  3. Probability Path 𝑝𝑑: This represents how the probability distribution evolves over time. As points move according to the flow, the probability mass gets transported along with them. The probability path shows how the initial distribution transforms into the target distribution.

These three concepts are deeply interconnected:

  • The velocity field determines the flow through the ODE
  • The flow transforms points, which in turn determines how the probability distribution evolves
  • The continuity equation ensures that probability mass is conserved during this transformation1. Under technical conditions and up to divergence-free vector fields, for a given 𝑝𝑑 (resp. 𝑒) there exists a 𝑒 (resp. 𝑝𝑑) such that the pair solves the continuity equation.

FigureΒ 2: Link between the probability path, the velocity field and the flow.

2. Stochastic Differential Equation (SDE)

See here.

    1. The continuity equation is a fundamental equation in physics that describes how a quantity (like probability mass) is conserved in a system. In the context of probability flows, it ensures that the total probability mass remains constant as the distribution evolves over time. Mathematically, it can be written as βˆ‚π‘‘π‘π‘‘+βˆ‡β‹…(𝑝𝑑𝑒)=0, where 𝑝𝑑 is the probability density and 𝑒 is the velocity field. This equation guarantees that no probability mass is created or destroyed during the transformation.

References

  1. Neural Ordinary Differential Equations
  2. A Visual Dive into Conditional Flow Matching