Squared Euclidean Distance #
KL divergence for Gaussian distributions #
The KL divergence between two Gaussians with the same variance has a particularly simple closed form. For 1D Gaussians N(θ₁, v) and N(θ₂, v):
KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v)
For d-dimensional isotropic Gaussians N_d(θ₁, vI) and N_d(θ₂, vI):
KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = ‖θ₁ - θ₂‖₂² / (2v)
These are direct computations using the Gaussian density formula. The d-dimensional case also follows from the 1D case via the tensorization property of KL divergence (Proposition 5.6(4)).
KL divergence between two 1D Gaussians with the same variance v:
KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v).
This is the standard closed-form formula obtained by direct computation of the KL integral for Gaussian densities with equal variance.
Instances For
KL divergence between two d-dimensional Gaussians with the same
isotropic covariance v · I_d:
KL(N_d(θ₁, v·I) ‖ N_d(θ₂, v·I)) = ‖θ₁ - θ₂‖₂² / (2v).
This follows from the tensorization property of KL divergence (Proposition 5.6(4)): the d-dimensional Gaussian with isotropic covariance is a product of d independent 1D Gaussians, so the KL divergence decomposes as a sum.
Instances For
Example 5.7: KL divergence between Gaussians with variance σ²/n #
Example 5.7 (1D case). KL divergence between two 1D Gaussians with same variance σ²/n:
KL(N(θ₁, σ²/n) ‖ N(θ₂, σ²/n)) = n(θ₁ − θ₂)² / (2σ²).
This is obtained by substituting v = σ²/n into the general formula
(θ₁ - θ₂)² / (2v) and simplifying:
(θ₁ - θ₂)² / (2 · σ²/n) = n · (θ₁ - θ₂)² / (2σ²).
Example 5.7 (d-dimensional case). KL divergence between two d-dimensional Gaussians with same isotropic covariance (σ²/n)I:
KL(N_d(θ₁, (σ²/n)I) ‖ N_d(θ₂, (σ²/n)I)) = n‖θ₁ − θ₂‖₂² / (2σ²).
This follows from the 1D case via the tensorization property: the
d-dimensional KL divergence equals the sum of coordinate-wise 1D KL
divergences, which telescopes to n · Σᵢ(θ₁ᵢ - θ₂ᵢ)² / (2σ²).
Tensorization: d-dimensional KL as sum of 1D KL divergences #
The d-dimensional Gaussian KL divergence is the sum of coordinate-wise 1D Gaussian KL divergences, reflecting the product structure of the Gaussian with isotropic covariance:
KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = Σᵢ KL(N(θ₁ᵢ, v) ‖ N(θ₂ᵢ, v)).
This is a consequence of the tensorization (additivity) of KL divergence for product measures (Proposition 5.6(4)).