Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Thm_5_10

noncomputable def FanoInequality.mixtureMeasure {Ω : Type u_1} [MeasurableSpace Ω] {M : } (P : Fin MMeasureTheory.Measure Ω) :

The mixture (average) measure: P̄ = (1/M) · Σⱼ Pⱼ.

This is the marginal distribution of X when the hypothesis index Z is drawn uniformly from {1, ..., M} and X | Z=j ∼ Pⱼ.

Instances For

    Bridge lemma: the local mixtureMeasure equals InfoTheory.mixtureMeasure. The difference is purely syntactic: (M)⁻¹ vs 1 / M in ENNReal, which are definitionally equal via one_div.

    theorem FanoInequality.fano_lemma {Ω : Type u_1} [MeasurableSpace Ω] (M : ) (hM : 3 M) (P : Fin MMeasureTheory.Measure Ω) [∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P j)] (hac : ∀ (j k : Fin M), (P j).AbsolutelyContinuous (P k)) (hfin : ∀ (j k : Fin M), InformationTheory.klDiv (P j) (P k) ) (ψ : ΩFin M) :
    Measurable ψ∃ (j : Fin M), ((P j) {ω : Ω | ψ ω j}).toReal 1 - (1 / M * j : Fin M, (InformationTheory.klDiv (P j) (mixtureMeasure P)).toReal + Real.log 2) / Real.log (M - 1)

    Fano's lemma (information-theoretic core).

    Derived from InfoTheory.fano_lemma. The local version adds the hac hypothesis (unused by the axiom but required by calling code) and universally quantifies over ψ in the conclusion.

    theorem FanoInequality.kl_mixture_le_avg_pairwise {Ω : Type u_1} [MeasurableSpace Ω] (M : ) (hM : 2 M) (P : Fin MMeasureTheory.Measure Ω) [∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P j)] (hac : ∀ (j k : Fin M), (P j).AbsolutelyContinuous (P k)) (hfin : ∀ (j k : Fin M), InformationTheory.klDiv (P j) (P k) ) :
    1 / M * j : Fin M, (InformationTheory.klDiv (P j) (mixtureMeasure P)).toReal 1 / M ^ 2 * j : Fin M, k : Fin M, (InformationTheory.klDiv (P j) (P k)).toReal

    KL to mixture ≤ average pairwise KL (Jensen's inequality for KL).

    Derived from InfoTheory.kl_mixture_le_avg_pairwise.

    theorem FanoInequality.fano_avg_numerator_bound {Ω : Type u_1} [MeasurableSpace Ω] (M : ) (hM : 2 M) (P : Fin MMeasureTheory.Measure Ω) [∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P j)] (hac : ∀ (j k : Fin M), (P j).AbsolutelyContinuous (P k)) (hfin : ∀ (j k : Fin M), InformationTheory.klDiv (P j) (P k) ) (ψ : ΩFin M) ( : Measurable ψ) :
    (1 / M * j : Fin M, ((P j) {ω : Ω | ψ ω j}).toReal) * Real.log (M - 1) + Real.log 2 Real.log M - 1 / M ^ 2 * j : Fin M, k : Fin M, (InformationTheory.klDiv (P j) (P k)).toReal

    Fano numerator bound (average error form): For M ≥ 2 and any test ψ,

    pe_avg · log(M-1) + log 2 ≥ log M − (1/M²) Σⱼ Σₖ KL(Pⱼ ∥ Pₖ)

    where pe_avg = (1/M) Σⱼ Pⱼ[ψ ≠ j] is the average testing error.

    This is the "numerator form" of Fano's inequality that avoids dividing by log(M-1), making it valid for all M ≥ 2 (including M = 2 where log(M-1) = log 1 = 0).

    Proof: Chain fano_entropy_core_bound (which gives pe_avg · log(M-1) + log 2 ≥ log M − (1/M) Σⱼ KL(Pⱼ ∥ P̄)) with kl_mixture_le_avg_pairwise (which gives (1/M) Σⱼ KL(Pⱼ ∥ P̄) ≤ (1/M²) Σⱼ Σₖ KL(Pⱼ ∥ Pₖ)).

    Reference: Rigollet, proof of Theorem 5.10.

    theorem FanoInequality.fano_inequality_general {Ω : Type u_1} [MeasurableSpace Ω] (M : ) (hM : 2 M) (P : Fin MMeasureTheory.Measure Ω) [∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P j)] (hac : ∀ (j k : Fin M), (P j).AbsolutelyContinuous (P k)) (hfin : ∀ (j k : Fin M), InformationTheory.klDiv (P j) (P k) ) (ψ : ΩFin M) :
    Measurable ψ∃ (j : Fin M), ((P j) {ω : Ω | ψ ω j}).toReal * Real.log (M - 1) + Real.log 2 Real.log M - 1 / M ^ 2 * j : Fin M, k : Fin M, (InformationTheory.klDiv (P j) (P k)).toReal

    Fano's inequality, numerator form (M ≥ 2).

    Let P₁, ..., P_M, M ≥ 2 be probability distributions such that Pⱼ ≪ Pₖ. Then for any measurable test ψ : Ω → Fin M, there exists j such that:

    Pⱼ[ψ ≠ j] · log(M-1) + log 2 ≥ log M − (1/M²) Σⱼₖ KL(Pⱼ ∥ Pₖ)

    This is equivalent to the standard Fano bound Pⱼ[ψ ≠ j] ≥ 1 − (avg_KL + log 2) / log(M-1) when M ≥ 3 (so log(M-1) > 0), but it avoids division by log(M-1), making it well-defined and correct for M = 2 as well.

    The proof applies fano_avg_numerator_bound (for the average error) and then uses a pigeonhole argument: some component error ≥ the average, and since log(M-1) ≥ 0, the numerator form is preserved.

    Reference: Rigollet, Theorem 5.10 (restated in numerator form).

    theorem FanoInequality.fano_inequality {Ω : Type u_1} [MeasurableSpace Ω] (M : ) (hM : 3 M) (P : Fin MMeasureTheory.Measure Ω) [∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P j)] (hac : ∀ (j k : Fin M), (P j).AbsolutelyContinuous (P k)) (hfin : ∀ (j k : Fin M), InformationTheory.klDiv (P j) (P k) ) (ψ : ΩFin M) :
    Measurable ψ∃ (j : Fin M), ((P j) {ω : Ω | ψ ω j}).toReal 1 - (1 / M ^ 2 * j : Fin M, k : Fin M, (InformationTheory.klDiv (P j) (P k)).toReal + Real.log 2) / Real.log (M - 1)

    Theorem 5.10 (Fano's inequality, denominator form).

    Let P₁, ..., P_M, M ≥ 2 be probability distributions on a measurable space (Ω, 𝓐) such that Pⱼ ≪ Pₖ for all j, k. Then for any M-ary test ψ : Ω → Fin M,

    max_{1≤j≤M} Pⱼ[ψ(X) ≠ j] ≥ 1 − ((1/M²) Σ_{j,k} KL(Pⱼ ∥ Pₖ) + log 2) / log(M − 1)

    The proof combines two steps:

    1. fano_lemma: bounds error using average KL to the mixture P̄
    2. kl_mixture_le_avg_pairwise: bounds KL to mixture by average pairwise KL

    The combination works by monotonicity: since A ≤ B (from step 2), we have 1 − (A + log 2)/log(M−1) ≥ 1 − (B + log 2)/log(M−1) when log(M−1) ≥ 0.

    Requires M ≥ 3 because dividing by log(M-1) = 0 when M = 2 produces a false bound in Lean's convention where x / 0 = 0. For M ≥ 2, use fano_inequality_general.