Atlas.HighDimensionalStatistics.code.Chapter5.Ex_5

Squared Euclidean Distance #

def GaussianKL.sqDist {d : ℕ} (θ₁ θ₂ : Fin d → ℝ) :

Squared ℓ₂ distance between two vectors in ℝ^d: ‖θ₁ - θ₂‖₂² = Σᵢ (θ₁ i - θ₂ i)².

Instances For

KL divergence for Gaussian distributions #

The KL divergence between two Gaussians with the same variance has a particularly simple closed form. For 1D Gaussians N(θ₁, v) and N(θ₂, v):

KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v)

For d-dimensional isotropic Gaussians N_d(θ₁, vI) and N_d(θ₂, vI):

KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = ‖θ₁ - θ₂‖₂² / (2v)

These are direct computations using the Gaussian density formula. The d-dimensional case also follows from the 1D case via the tensorization property of KL divergence (Proposition 5.6(4)).

source

noncomputable def GaussianKL.klDiv_gaussian (θ₁ θ₂ v : ℝ) :

ℝ

KL divergence between two 1D Gaussians with the same variance v: KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v).

This is the standard closed-form formula obtained by direct computation of the KL integral for Gaussian densities with equal variance.

Instances For

source

noncomputable def GaussianKL.klDiv_gaussian_d {d : ℕ} (θ₁ θ₂ : Fin d → ℝ) (v : ℝ) :

ℝ

KL divergence between two d-dimensional Gaussians with the same isotropic covariance v · I_d: KL(N_d(θ₁, v·I) ‖ N_d(θ₂, v·I)) = ‖θ₁ - θ₂‖₂² / (2v).

This follows from the tensorization property of KL divergence (Proposition 5.6(4)): the d-dimensional Gaussian with isotropic covariance is a product of d independent 1D Gaussians, so the KL divergence decomposes as a sum.

Instances For

Example 5.7: KL divergence between Gaussians with variance σ²/n #

source

theorem GaussianKL.gaussian_kl_1d (θ₁ θ₂ σ : ℝ) (n : ℕ) (hσ : 0 < σ) (hn : 0 < n) :

klDiv_gaussian θ₁ θ₂ (σ ^ 2 / ↑n) = ↑n * (θ₁ - θ₂) ^ 2 / (2 * σ ^ 2)

Example 5.7 (1D case). KL divergence between two 1D Gaussians with same variance σ²/n:

KL(N(θ₁, σ²/n) ‖ N(θ₂, σ²/n)) = n(θ₁ − θ₂)² / (2σ²).

This is obtained by substituting v = σ²/n into the general formula (θ₁ - θ₂)² / (2v) and simplifying: (θ₁ - θ₂)² / (2 · σ²/n) = n · (θ₁ - θ₂)² / (2σ²).

source

theorem GaussianKL.gaussian_kl_d (d : ℕ) (θ₁ θ₂ : Fin d → ℝ) (σ : ℝ) (n : ℕ) (hσ : 0 < σ) (hn : 0 < n) :

klDiv_gaussian_d θ₁ θ₂ (σ ^ 2 / ↑n) = ↑n * sqDist θ₁ θ₂ / (2 * σ ^ 2)

Example 5.7 (d-dimensional case). KL divergence between two d-dimensional Gaussians with same isotropic covariance (σ²/n)I:

KL(N_d(θ₁, (σ²/n)I) ‖ N_d(θ₂, (σ²/n)I)) = n‖θ₁ − θ₂‖₂² / (2σ²).

This follows from the 1D case via the tensorization property: the d-dimensional KL divergence equals the sum of coordinate-wise 1D KL divergences, which telescopes to n · Σᵢ(θ₁ᵢ - θ₂ᵢ)² / (2σ²).

Tensorization: d-dimensional KL as sum of 1D KL divergences #

source

theorem GaussianKL.klDiv_gaussian_d_eq_sum {d : ℕ} (θ₁ θ₂ : Fin d → ℝ) (v : ℝ) :

klDiv_gaussian_d θ₁ θ₂ v = ∑ i : Fin d, klDiv_gaussian (θ₁ i) (θ₂ i) v

The d-dimensional Gaussian KL divergence is the sum of coordinate-wise 1D Gaussian KL divergences, reflecting the product structure of the Gaussian with isotropic covariance:

KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = Σᵢ KL(N(θ₁ᵢ, v) ‖ N(θ₂ᵢ, v)).

This is a consequence of the tensorization (additivity) of KL divergence for product measures (Proposition 5.6(4)).

Documentation