Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Ex_5_7

Squared Euclidean Distance #

def GaussianKL.sqDist {d : } (θ₁ θ₂ : Fin d) :

Squared ℓ₂ distance between two vectors in ℝ^d: ‖θ₁ - θ₂‖₂² = Σᵢ (θ₁ i - θ₂ i)².

Instances For

    KL divergence for Gaussian distributions #

    The KL divergence between two Gaussians with the same variance has a particularly simple closed form. For 1D Gaussians N(θ₁, v) and N(θ₂, v):

    KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v)

    For d-dimensional isotropic Gaussians N_d(θ₁, vI) and N_d(θ₂, vI):

    KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = ‖θ₁ - θ₂‖₂² / (2v)

    These are direct computations using the Gaussian density formula. The d-dimensional case also follows from the 1D case via the tensorization property of KL divergence (Proposition 5.6(4)).

    noncomputable def GaussianKL.klDiv_gaussian (θ₁ θ₂ v : ) :

    KL divergence between two 1D Gaussians with the same variance v: KL(N(θ₁, v) ‖ N(θ₂, v)) = (θ₁ - θ₂)² / (2v).

    This is the standard closed-form formula obtained by direct computation of the KL integral for Gaussian densities with equal variance.

    Instances For
      noncomputable def GaussianKL.klDiv_gaussian_d {d : } (θ₁ θ₂ : Fin d) (v : ) :

      KL divergence between two d-dimensional Gaussians with the same isotropic covariance v · I_d: KL(N_d(θ₁, v·I) ‖ N_d(θ₂, v·I)) = ‖θ₁ - θ₂‖₂² / (2v).

      This follows from the tensorization property of KL divergence (Proposition 5.6(4)): the d-dimensional Gaussian with isotropic covariance is a product of d independent 1D Gaussians, so the KL divergence decomposes as a sum.

      Instances For

        Example 5.7: KL divergence between Gaussians with variance σ²/n #

        theorem GaussianKL.gaussian_kl_1d (θ₁ θ₂ σ : ) (n : ) ( : 0 < σ) (hn : 0 < n) :
        klDiv_gaussian θ₁ θ₂ (σ ^ 2 / n) = n * (θ₁ - θ₂) ^ 2 / (2 * σ ^ 2)

        Example 5.7 (1D case). KL divergence between two 1D Gaussians with same variance σ²/n:

        KL(N(θ₁, σ²/n) ‖ N(θ₂, σ²/n)) = n(θ₁ − θ₂)² / (2σ²).

        This is obtained by substituting v = σ²/n into the general formula (θ₁ - θ₂)² / (2v) and simplifying: (θ₁ - θ₂)² / (2 · σ²/n) = n · (θ₁ - θ₂)² / (2σ²).

        theorem GaussianKL.gaussian_kl_d (d : ) (θ₁ θ₂ : Fin d) (σ : ) (n : ) ( : 0 < σ) (hn : 0 < n) :
        klDiv_gaussian_d θ₁ θ₂ (σ ^ 2 / n) = n * sqDist θ₁ θ₂ / (2 * σ ^ 2)

        Example 5.7 (d-dimensional case). KL divergence between two d-dimensional Gaussians with same isotropic covariance (σ²/n)I:

        KL(N_d(θ₁, (σ²/n)I) ‖ N_d(θ₂, (σ²/n)I)) = n‖θ₁ − θ₂‖₂² / (2σ²).

        This follows from the 1D case via the tensorization property: the d-dimensional KL divergence equals the sum of coordinate-wise 1D KL divergences, which telescopes to n · Σᵢ(θ₁ᵢ - θ₂ᵢ)² / (2σ²).

        Tensorization: d-dimensional KL as sum of 1D KL divergences #

        theorem GaussianKL.klDiv_gaussian_d_eq_sum {d : } (θ₁ θ₂ : Fin d) (v : ) :
        klDiv_gaussian_d θ₁ θ₂ v = i : Fin d, klDiv_gaussian (θ₁ i) (θ₂ i) v

        The d-dimensional Gaussian KL divergence is the sum of coordinate-wise 1D Gaussian KL divergences, reflecting the product structure of the Gaussian with isotropic covariance:

        KL(N_d(θ₁, vI) ‖ N_d(θ₂, vI)) = Σᵢ KL(N(θ₁ᵢ, v) ‖ N(θ₂ᵢ, v)).

        This is a consequence of the tensorization (additivity) of KL divergence for product measures (Proposition 5.6(4)).