Proposition 5.6: Properties of KL Divergence #
This file provides the bundled statement of Proposition 5.6:
- Nonnegativity (Gibbs' inequality): KL(P, Q) ≥ 0
- ENNReal form: trivial by type
- Real-valued form (substantive, via Jensen): 0 ≤ ∫ llr dP + Q(Ω) − P(Ω)
- Tensorization (additivity for products):
- Binary: KL(P₁⊗P₂, Q₁⊗Q₂) = KL(P₁,Q₁) + KL(P₂,Q₂)
- General n-fold: KL(⊗ᵢ Pᵢ, ⊗ᵢ Qᵢ) = Σᵢ KL(Pᵢ,Qᵢ)
The individual parts and their proofs are in Def_5_5.lean.
Helper lemmas for the n-fold tensorization #
theorem
Rigollet.Chapter5.klDiv_map_measurableEquiv
{α : Type u_1}
{β : Type u_2}
[MeasurableSpace α]
[MeasurableSpace β]
(e : α ≃ᵐ β)
(μ ν : MeasureTheory.Measure α)
[MeasureTheory.IsFiniteMeasure μ]
[MeasureTheory.IsFiniteMeasure ν]
:
InformationTheory.klDiv (MeasureTheory.Measure.map (⇑e) μ) (MeasureTheory.Measure.map (⇑e) ν) = InformationTheory.klDiv μ ν
klDiv is invariant under pushforward by a measurable equivalence. Uses the f-divergence formula: klDiv μ ν = ∫⁻ x, ofReal(klFun(rnDeriv μ ν x)) ∂ν
theorem
Rigollet.Chapter5.klDiv_prod_of_isProbabilityMeasure
{α : Type u_1}
{β : Type u_2}
[MeasurableSpace α]
[MeasurableSpace β]
(P₁ Q₁ : MeasureTheory.Measure α)
(P₂ Q₂ : MeasureTheory.Measure β)
[MeasureTheory.IsProbabilityMeasure P₁]
[MeasureTheory.IsProbabilityMeasure Q₁]
[MeasureTheory.IsProbabilityMeasure P₂]
[MeasureTheory.IsProbabilityMeasure Q₂]
:
InformationTheory.klDiv (P₁.prod P₂) (Q₁.prod Q₂) = InformationTheory.klDiv P₁ Q₁ + InformationTheory.klDiv P₂ Q₂
Binary tensorization of KL divergence for probability measures (unconditional). Uses the Mathlib chain rule for compProd.
theorem
Rigollet.Chapter5.prop_5_6_tensorization_nfold
{n : ℕ}
{α : Fin n → Type u_1}
[(i : Fin n) → MeasurableSpace (α i)]
(P Q : (i : Fin n) → MeasureTheory.Measure (α i))
[∀ (i : Fin n), MeasureTheory.IsProbabilityMeasure (P i)]
[∀ (i : Fin n), MeasureTheory.IsProbabilityMeasure (Q i)]
:
InformationTheory.klDiv (MeasureTheory.Measure.pi P) (MeasureTheory.Measure.pi Q) = ∑ i : Fin n, InformationTheory.klDiv (P i) (Q i)
theorem
Rigollet.Chapter5.prop_5_6_bundle :
(∀ {Ω : Type u_1} [inst : MeasurableSpace Ω] (P Q : MeasureTheory.Measure Ω), 0 ≤ InformationTheory.klDiv P Q) ∧ (∀ {Ω : Type u_2} [inst : MeasurableSpace Ω] {P Q : MeasureTheory.Measure Ω} [MeasureTheory.IsFiniteMeasure P]
[MeasureTheory.IsFiniteMeasure Q],
P.AbsolutelyContinuous Q →
MeasureTheory.Integrable (MeasureTheory.llr P Q) P →
0 ≤ ∫ (x : Ω), MeasureTheory.llr P Q x ∂P + Q.real Set.univ - P.real Set.univ) ∧ (∀ {α : Type u_3} {β : Type u_4} [inst : MeasurableSpace α] [inst_1 : MeasurableSpace β]
{P1 Q1 : MeasureTheory.Measure α} {P2 Q2 : MeasureTheory.Measure β},
MeasureTheory.IsProbabilityMeasure P1 →
MeasureTheory.IsProbabilityMeasure Q1 →
MeasureTheory.IsProbabilityMeasure P2 →
MeasureTheory.IsProbabilityMeasure Q2 →
P1.AbsolutelyContinuous Q1 →
P2.AbsolutelyContinuous Q2 →
MeasureTheory.Integrable (MeasureTheory.llr P1 Q1) P1 →
MeasureTheory.Integrable (MeasureTheory.llr P2 Q2) P2 →
InformationTheory.klDiv (P1.prod P2) (Q1.prod Q2) = InformationTheory.klDiv P1 Q1 + InformationTheory.klDiv P2 Q2) ∧ ∀ {n : ℕ} {α : Fin n → Type u_5} [inst : (i : Fin n) → MeasurableSpace (α i)]
(P Q : (i : Fin n) → MeasureTheory.Measure (α i)) [∀ (i : Fin n), MeasureTheory.IsProbabilityMeasure (P i)]
[∀ (i : Fin n), MeasureTheory.IsProbabilityMeasure (Q i)],
InformationTheory.klDiv (MeasureTheory.Measure.pi P) (MeasureTheory.Measure.pi Q) = ∑ i : Fin n, InformationTheory.klDiv (P i) (Q i)
Proposition 5.6 (Properties of KL divergence, bundled).