Definition 5.5 (KL Divergence) and Proposition 5.6 (Basic Properties) #

From: Rigollet, "High Dimensional Statistics", Chapter 5

Definition 5.5. The KL divergence between P and Q (both absolutely continuous w.r.t. some dominating measure μ) is: KL(P‖Q) = ∫ log(dP/dQ) dP = ∫ (dP/dμ) log(dP/dμ / dQ/dμ) dμ

Proposition 5.6. Properties: 1. KL(P‖Q) ≥ 0 (Gibbs' inequality) 2. KL(P‖Q) = 0 iff P = Q 3. KL is NOT symmetric 4. KL(P^⊗n‖Q^⊗n) = n · KL(P‖Q) (tensorization)

Definition 5.5: KL Divergence #

The KL divergence is formalized in Mathlib as InformationTheory.klDiv, which maps two measures to ENNReal. For finite measures P, Q with P absolutely continuous w.r.t. Q and integrable log-likelihood ratio:

klDiv P Q = ENNReal.ofReal (integral x, llr P Q x partial P + Q.real univ - P.real univ)

where llr P Q x = log ((P.rnDeriv Q x).toReal) is the log-likelihood ratio (i.e., log(dP/dQ)).

For probability measures P, Q (where P.real univ = Q.real univ = 1), this simplifies to:

klDiv P Q = ENNReal.ofReal (integral x, log(dP/dQ(x)) dP)

which is exactly Definition 5.5 from the text.

source

noncomputable def Rigollet.Chapter5.klDivergence {Omega : Type u_1} [MeasurableSpace Omega] (P Q : MeasureTheory.Measure Omega) :

ENNReal

Definition 5.5 (KL Divergence). The Kullback-Leibler divergence between measures P and Q. This is an alias for Mathlib's InformationTheory.klDiv, which computes KL(P, Q) as the integral of the log-likelihood ratio for probability measures P absolutely continuous w.r.t. Q, and returns infinity when P is not absolutely continuous w.r.t. Q.

Instances For

source

theorem Rigollet.Chapter5.klDivergence_eq_integral_llr {Omega : Type u_1} [MeasurableSpace Omega] {P Q : MeasureTheory.Measure Omega} [MeasureTheory.IsProbabilityMeasure P] [MeasureTheory.IsProbabilityMeasure Q] (hPQ : P.AbsolutelyContinuous Q) (h_eq : P Set.univ = Q Set.univ) :

(klDivergence P Q).toReal = ∫ (x : Omega), MeasureTheory.llr P Q x ∂P

For probability measures, the toReal of KL divergence equals the integral of the log-likelihood ratio, which is the formula in Definition 5.5.

Proposition 5.6: Basic Properties of KL Divergence #

Proposition 5.6(1): Gibbs' inequality (KL(P,Q) >= 0) #

Since Mathlib's klDiv takes values in ENNReal, it is nonneg by type. The substantive content is that the real-valued integrand is itself nonneg for finite measures with P absolutely continuous w.r.t. Q.

source

theorem Rigollet.Chapter5.prop_5_6_nonneg {Omega : Type u_1} [MeasurableSpace Omega] (P Q : MeasureTheory.Measure Omega) :

0 ≤ InformationTheory.klDiv P Q

Proposition 5.6(1) (Gibbs' inequality, extended nonneg real form): KL(P, Q) >= 0. Since klDiv takes values in ENNReal, this is immediate.

source

theorem Rigollet.Chapter5.prop_5_6_gibbs_real {Omega : Type u_1} [MeasurableSpace Omega] {P Q : MeasureTheory.Measure Omega} [MeasureTheory.IsFiniteMeasure P] [MeasureTheory.IsFiniteMeasure Q] (hPQ : P.AbsolutelyContinuous Q) (h_int : MeasureTheory.Integrable (MeasureTheory.llr P Q) P) :

0 ≤ ∫ (x : Omega), MeasureTheory.llr P Q x ∂P + Q.real Set.univ - P.real Set.univ

Proposition 5.6(1') (Gibbs' inequality, real-valued form): For finite measures P absolutely continuous w.r.t. Q with integrable log-likelihood ratio, the real-valued expression defining KL divergence is nonneg. This is the substantive mathematical content of Gibbs' inequality.

Proposition 5.6(2): KL(P,Q) = 0 iff P = Q #

source

theorem Rigollet.Chapter5.prop_5_6_eq_zero_iff {Omega : Type u_1} [MeasurableSpace Omega] {P Q : MeasureTheory.Measure Omega} [MeasureTheory.IsFiniteMeasure P] [MeasureTheory.IsFiniteMeasure Q] :

InformationTheory.klDiv P Q = 0 ↔ P = Q

Proposition 5.6(2): KL(P, Q) = 0 if and only if P = Q. The forward direction is sometimes called the "converse Gibbs' inequality".

source

theorem Rigollet.Chapter5.prop_5_6_self_eq_zero {Omega : Type u_1} [MeasurableSpace Omega] (P : MeasureTheory.Measure Omega) [MeasureTheory.SigmaFinite P] :

InformationTheory.klDiv P P = 0

Proposition 5.6(2a): KL(P, P) = 0 for any sigma-finite measure P.

Proposition 5.6(3): KL divergence is NOT symmetric #

We exhibit a concrete counterexample: P = 0 (zero measure) and Q = Dirac at (). Then KL(P, Q) = Q(Omega) (finite) but KL(Q, P) = infinity (since Q is not absolutely continuous w.r.t. 0).

source

theorem Rigollet.Chapter5.prop_5_6_not_symmetric :

∃ (Omega : Type) (x : MeasurableSpace Omega) (P : MeasureTheory.Measure Omega) (Q : MeasureTheory.Measure Omega), InformationTheory.klDiv P Q ≠ InformationTheory.klDiv Q P

Proposition 5.6(3): KL divergence is NOT symmetric. There exist measures P, Q such that KL(P, Q) is not equal to KL(Q, P).

Proposition 5.6(4): Tensorization / Additivity for product measures #

For independent product measures: KL(P1 x P2, Q1 x Q2) = KL(P1, Q1) + KL(P2, Q2)

The n-fold tensorization KL(P^n, Q^n) = n * KL(P, Q) follows by induction.

Book proof outline (Prop 5.6 part 2):

Product Radon-Nikodym derivative: d(P₁⊗P₂)/d(Q₁⊗Q₂)(x₁,x₂) = (dP₁/dQ₁)(x₁) · (dP₂/dQ₂)(x₂).
Taking logarithms: log(d(P₁⊗P₂)/d(Q₁⊗Q₂)) = log(dP₁/dQ₁) + log(dP₂/dQ₂).
Integrating against P₁⊗P₂ and applying Fubini: KL(P₁⊗P₂ ‖ Q₁⊗Q₂) = ∫∫ [log(dP₁/dQ₁)(x₁) + log(dP₂/dQ₂)(x₂)] dP₁dP₂ = ∫ log(dP₁/dQ₁) dP₁ · (∫dP₂) + (∫dP₁) · ∫ log(dP₂/dQ₂) dP₂ = KL(P₁‖Q₁) · 1 + 1 · KL(P₂‖Q₂) [since Pᵢ are probability measures] = KL(P₁‖Q₁) + KL(P₂‖Q₂).

Formalization blockers:

The factorization of Measure.rnDeriv for product measures (d(μ₁.prod μ₂)/d(ν₁.prod ν₂) = (dμ₁/dν₁) ⊗ (dμ₂/dν₂)) is not yet in Mathlib. Kernel-based versions exist in Probability.Kernel.Composition, but not for plain Measure.prod.
Without this, the integral decomposition via Fubini cannot be completed.

source

theorem Rigollet.Chapter5.rnDeriv_prod_eq {alpha : Type u_1} {beta : Type u_2} [MeasurableSpace alpha] [MeasurableSpace beta] {P1 Q1 : MeasureTheory.Measure alpha} {P2 Q2 : MeasureTheory.Measure beta} [MeasureTheory.SigmaFinite P1] [MeasureTheory.SigmaFinite Q1] [MeasureTheory.SigmaFinite P2] [MeasureTheory.SigmaFinite Q2] (_hac1 : P1.AbsolutelyContinuous Q1) (_hac2 : P2.AbsolutelyContinuous Q2) :

(P1.prod P2).rnDeriv (Q1.prod Q2) =ᵐ[Q1.prod Q2] fun (x : alpha × beta) => P1.rnDeriv Q1 x.1 * P2.rnDeriv Q2 x.2

Auxiliary: The Radon-Nikodym derivative of a product measure factors as a product of Radon-Nikodym derivatives. This is the key missing Mathlib lemma. d(P₁.prod P₂)/d(Q₁.prod Q₂)(x₁, x₂) = (dP₁/dQ₁)(x₁) · (dP₂/dQ₂)(x₂) a.e. w.r.t. Q₁.prod Q₂.

source

theorem Rigollet.Chapter5.prop_5_6_tensorization {alpha : Type u_1} {beta : Type u_2} {x✝ : MeasurableSpace alpha} {x✝¹ : MeasurableSpace beta} {P1 Q1 : MeasureTheory.Measure alpha} {P2 Q2 : MeasureTheory.Measure beta} (hP1 : MeasureTheory.IsProbabilityMeasure P1) (hQ1 : MeasureTheory.IsProbabilityMeasure Q1) (hP2 : MeasureTheory.IsProbabilityMeasure P2) (hQ2 : MeasureTheory.IsProbabilityMeasure Q2) (hPQ1 : P1.AbsolutelyContinuous Q1) (hPQ2 : P2.AbsolutelyContinuous Q2) (h_int1 : MeasureTheory.Integrable (MeasureTheory.llr P1 Q1) P1) (h_int2 : MeasureTheory.Integrable (MeasureTheory.llr P2 Q2) P2) :

InformationTheory.klDiv (P1.prod P2) (Q1.prod Q2) = InformationTheory.klDiv P1 Q1 + InformationTheory.klDiv P2 Q2

Proposition 5.6(4) (Tensorization / Additivity): For probability measures, KL(P1 × P2, Q1 × Q2) = KL(P1, Q1) + KL(P2, Q2).

The proof proceeds in three steps following the book:

Factor the product RN derivative (rnDeriv_prod_eq)
Apply log to convert product → sum: log(f·g) = log(f) + log(g)
Apply Fubini to separate the integral into two independent integrals, using ∫dPᵢ = 1 for probability measures.

The sorry blocks correspond to:

rnDeriv_prod_eq: product RN derivative factorization (not in Mathlib)
Fubini decomposition + log-product identity in the KL integral

source

theorem Rigollet.Chapter5.prop_5_6 :

(∀ {Ω : Type u_1} [inst : MeasurableSpace Ω] (P Q : MeasureTheory.Measure Ω), 0 ≤ InformationTheory.klDiv P Q) ∧ ∀ {α : Type u_2} {β : Type u_3} [inst : MeasurableSpace α] [inst_1 : MeasurableSpace β] {P1 Q1 : MeasureTheory.Measure α} {P2 Q2 : MeasureTheory.Measure β}, MeasureTheory.IsProbabilityMeasure P1 → MeasureTheory.IsProbabilityMeasure Q1 → MeasureTheory.IsProbabilityMeasure P2 → MeasureTheory.IsProbabilityMeasure Q2 → P1.AbsolutelyContinuous Q1 → P2.AbsolutelyContinuous Q2 → MeasureTheory.Integrable (MeasureTheory.llr P1 Q1) P1 → MeasureTheory.Integrable (MeasureTheory.llr P2 Q2) P2 → InformationTheory.klDiv (P1.prod P2) (Q1.prod Q2) = InformationTheory.klDiv P1 Q1 + InformationTheory.klDiv P2 Q2

Proposition 5.6 (Properties of KL divergence). (i) KL(P, Q) ≥ 0 (Gibbs' inequality). (ii) If P = P₁ ⊗ P₂ and Q = Q₁ ⊗ Q₂ are product probability measures, then KL(P,Q) = KL(P₁,Q₁) + KL(P₂,Q₂).

Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Def_5_5

Definition 5.5 (KL Divergence) and Proposition 5.6 (Basic Properties) #