Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Def_5_5

Definition 5.5 (KL Divergence) and Proposition 5.6 (Basic Properties) #

From: Rigollet, "High Dimensional Statistics", Chapter 5

Definition 5.5. The KL divergence between P and Q (both absolutely continuous w.r.t. some dominating measure μ) is: KL(P‖Q) = ∫ log(dP/dQ) dP = ∫ (dP/dμ) log(dP/dμ / dQ/dμ) dμ

Proposition 5.6. Properties: 1. KL(P‖Q) ≥ 0 (Gibbs' inequality) 2. KL(P‖Q) = 0 iff P = Q 3. KL is NOT symmetric 4. KL(P^⊗n‖Q^⊗n) = n · KL(P‖Q) (tensorization)

Definition 5.5: KL Divergence #

The KL divergence is formalized in Mathlib as InformationTheory.klDiv, which maps two measures to ENNReal. For finite measures P, Q with P absolutely continuous w.r.t. Q and integrable log-likelihood ratio:

klDiv P Q = ENNReal.ofReal (integral x, llr P Q x partial P + Q.real univ - P.real univ)

where llr P Q x = log ((P.rnDeriv Q x).toReal) is the log-likelihood ratio (i.e., log(dP/dQ)).

For probability measures P, Q (where P.real univ = Q.real univ = 1), this simplifies to:

klDiv P Q = ENNReal.ofReal (integral x, log(dP/dQ(x)) dP)

which is exactly Definition 5.5 from the text.

noncomputable def Rigollet.Chapter5.klDivergence {Omega : Type u_1} [MeasurableSpace Omega] (P Q : MeasureTheory.Measure Omega) :

Definition 5.5 (KL Divergence). The Kullback-Leibler divergence between measures P and Q. This is an alias for Mathlib's InformationTheory.klDiv, which computes KL(P, Q) as the integral of the log-likelihood ratio for probability measures P absolutely continuous w.r.t. Q, and returns infinity when P is not absolutely continuous w.r.t. Q.

Instances For

    For probability measures, the toReal of KL divergence equals the integral of the log-likelihood ratio, which is the formula in Definition 5.5.

    Proposition 5.6: Basic Properties of KL Divergence #

    Proposition 5.6(1): Gibbs' inequality (KL(P,Q) >= 0) #

    Since Mathlib's klDiv takes values in ENNReal, it is nonneg by type. The substantive content is that the real-valued integrand is itself nonneg for finite measures with P absolutely continuous w.r.t. Q.

    Proposition 5.6(1) (Gibbs' inequality, extended nonneg real form): KL(P, Q) >= 0. Since klDiv takes values in ENNReal, this is immediate.

    Proposition 5.6(1') (Gibbs' inequality, real-valued form): For finite measures P absolutely continuous w.r.t. Q with integrable log-likelihood ratio, the real-valued expression defining KL divergence is nonneg. This is the substantive mathematical content of Gibbs' inequality.

    Proposition 5.6(2): KL(P,Q) = 0 iff P = Q #

    Proposition 5.6(2): KL(P, Q) = 0 if and only if P = Q. The forward direction is sometimes called the "converse Gibbs' inequality".

    Proposition 5.6(2a): KL(P, P) = 0 for any sigma-finite measure P.

    Proposition 5.6(3): KL divergence is NOT symmetric #

    We exhibit a concrete counterexample: P = 0 (zero measure) and Q = Dirac at (). Then KL(P, Q) = Q(Omega) (finite) but KL(Q, P) = infinity (since Q is not absolutely continuous w.r.t. 0).

    Proposition 5.6(3): KL divergence is NOT symmetric. There exist measures P, Q such that KL(P, Q) is not equal to KL(Q, P).

    Proposition 5.6(4): Tensorization / Additivity for product measures #

    For independent product measures: KL(P1 x P2, Q1 x Q2) = KL(P1, Q1) + KL(P2, Q2)

    The n-fold tensorization KL(P^n, Q^n) = n * KL(P, Q) follows by induction.

    Book proof outline (Prop 5.6 part 2):

    1. Product Radon-Nikodym derivative: d(P₁⊗P₂)/d(Q₁⊗Q₂)(x₁,x₂) = (dP₁/dQ₁)(x₁) · (dP₂/dQ₂)(x₂).
    2. Taking logarithms: log(d(P₁⊗P₂)/d(Q₁⊗Q₂)) = log(dP₁/dQ₁) + log(dP₂/dQ₂).
    3. Integrating against P₁⊗P₂ and applying Fubini: KL(P₁⊗P₂ ‖ Q₁⊗Q₂) = ∫∫ [log(dP₁/dQ₁)(x₁) + log(dP₂/dQ₂)(x₂)] dP₁dP₂ = ∫ log(dP₁/dQ₁) dP₁ · (∫dP₂) + (∫dP₁) · ∫ log(dP₂/dQ₂) dP₂ = KL(P₁‖Q₁) · 1 + 1 · KL(P₂‖Q₂) [since Pᵢ are probability measures] = KL(P₁‖Q₁) + KL(P₂‖Q₂).

    Formalization blockers:

    theorem Rigollet.Chapter5.rnDeriv_prod_eq {alpha : Type u_1} {beta : Type u_2} [MeasurableSpace alpha] [MeasurableSpace beta] {P1 Q1 : MeasureTheory.Measure alpha} {P2 Q2 : MeasureTheory.Measure beta} [MeasureTheory.SigmaFinite P1] [MeasureTheory.SigmaFinite Q1] [MeasureTheory.SigmaFinite P2] [MeasureTheory.SigmaFinite Q2] (_hac1 : P1.AbsolutelyContinuous Q1) (_hac2 : P2.AbsolutelyContinuous Q2) :
    (P1.prod P2).rnDeriv (Q1.prod Q2) =ᵐ[Q1.prod Q2] fun (x : alpha × beta) => P1.rnDeriv Q1 x.1 * P2.rnDeriv Q2 x.2

    Auxiliary: The Radon-Nikodym derivative of a product measure factors as a product of Radon-Nikodym derivatives. This is the key missing Mathlib lemma. d(P₁.prod P₂)/d(Q₁.prod Q₂)(x₁, x₂) = (dP₁/dQ₁)(x₁) · (dP₂/dQ₂)(x₂) a.e. w.r.t. Q₁.prod Q₂.

    Proposition 5.6(4) (Tensorization / Additivity): For probability measures, KL(P1 × P2, Q1 × Q2) = KL(P1, Q1) + KL(P2, Q2).

    The proof proceeds in three steps following the book:

    1. Factor the product RN derivative (rnDeriv_prod_eq)
    2. Apply log to convert product → sum: log(f·g) = log(f) + log(g)
    3. Apply Fubini to separate the integral into two independent integrals, using ∫dPᵢ = 1 for probability measures.

    The sorry blocks correspond to:

    • rnDeriv_prod_eq: product RN derivative factorization (not in Mathlib)
    • Fubini decomposition + log-product identity in the KL integral

    Proposition 5.6 (Properties of KL divergence). (i) KL(P, Q) ≥ 0 (Gibbs' inequality). (ii) If P = P₁ ⊗ P₂ and Q = Q₁ ⊗ Q₂ are product probability measures, then KL(P,Q) = KL(P₁,Q₁) + KL(P₂,Q₂).