Atlas.HighDimensionalStatistics.code.Chapter5.Thm_5

Minimax testing risk over a parameter set Θ ⊆ ℝᵈ with respect to a measure family P: minimaxRisk P Θ ϕ = inf_{θ̂ measurable} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ).

The outer infimum is over all measurable estimators θ̂ : ℝᵈ → ℝᵈ and the inner supremum is over all parameters θ ∈ Θ. The value P_θ(|θ̂ − θ|₂² ≥ ϕ) is the probability (under P_θ) that the squared error exceeds the threshold ϕ.

Instances For

source

theorem MinimaxLowerBound.sqDist_eq_infoTheory {d : ℕ} (θ₁ θ₂ : Fin d → ℝ) :

sqDist θ₁ θ₂ = InfoTheory.sqDist θ₁ θ₂

The local sqDist equals InfoTheory.sqDist (definitionally equal).

source

theorem MinimaxLowerBound.fano_algebraic_bound {M : ℕ} (hM : 5 ≤ M) {α : ℝ} (hα_pos : 0 < α) :

(α * Real.log ↑M + Real.log 2) / Real.log (↑M - 1) ≤ 2 * α + 1 / 2

Key algebraic lemma: for M ≥ 5 and α > 0, (α · log M + log 2) / log (M - 1) ≤ 2α + 1/2.

This follows from:

log M ≤ 2 · log(M-1) for M ≥ 5 (since M ≤ (M-1)²)
log 2 ≤ (1/2) · log(M-1) for M ≥ 5 (since M-1 ≥ 4 = 2²)

source

theorem MinimaxLowerBound.theorem_5_11 {d M : ℕ} (hM : 5 ≤ M) {Θ : Set (Fin d → ℝ)} (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (θ : Fin M → Fin d → ℝ) (hθ_mem : ∀ (j : Fin M), θ j ∈ Θ) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (α : ℝ) (hα_pos : 0 < α) (_hα_lt : α < 1 / 4) (ϕ : ℝ) (hϕ : 0 < ϕ) (hsep : ∀ (j k : Fin M), j ≠ k → sqDist (θ j) (θ k) ≥ 4 * ϕ) (hkl : ∀ (j k : Fin M), j ≠ k → sqDist (θ j) (θ k) ≤ 2 * α * σ ^ 2 / ↑n * Real.log ↑M) [hprob : ∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P (θ j))] (hP_prob : ∀ θ' ∈ Θ, MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (j k : Fin M), (P (θ j)).AbsolutelyContinuous (P (θ k))) (hfin_kl : ∀ (j k : Fin M), InformationTheory.klDiv (P (θ j)) (P (θ k)) ≠ ⊤) (hGSM : ∀ (j k : Fin M), (InformationTheory.klDiv (P (θ j)) (P (θ k))).toReal = ↑n * InfoTheory.sqDist (θ j) (θ k) / (2 * σ ^ 2)) :

minimaxRisk P Θ ϕ ≥ 1 / 2 - 2 * α

Theorem 5.11 (Many-hypothesis minimax lower bound via Fano).

Assume that Θ contains M ≥ 5 hypotheses θ₁, …, θ_M such that for some constant 0 < α < 1/4: (i) sqDist(θⱼ, θₖ) ≥ 4ϕ for all j ≠ k (separation condition) (ii) sqDist(θⱼ, θₖ) ≤ (2ασ²/n)·log(M) for all j ≠ k (KL control)

Then: inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ) ≥ 1/2 − 2α

The proof applies Fano's inequality (Theorem 5.10) after bounding the average pairwise KL divergence using condition (ii) and the Gaussian KL formula KL(P_j ∥ P_k) = n·|θⱼ − θₖ|₂²/(2σ²).

Note: The textbook implicitly assumes the Gaussian sequence model (GSM), where each P_θ is a probability measure, measures are mutually absolutely continuous, and KL(P_θ₁ ∥ P_θ₂) = n|θ₁−θ₂|²/(2σ²). These standard statistical assumptions are made explicit as hypotheses hprob, hac, and hGSM (following the axiom structure in InfoTheory.lean).

Application: Sparse Estimation Lower Bound #

Using the Sparse Varshamov-Gilbert Lemma (Lemma 5.14) together with Theorem 5.11, we obtain the minimax lower bound for estimating k-sparse vectors in the Gaussian sequence model.

source

noncomputable def MinimaxLowerBound.l0norm {d : ℕ} (θ : Fin d → ℝ) :

ℕ

The ℓ₀ "norm" (number of nonzero entries) of a vector in ℝᵈ.

Instances For

source

def MinimaxLowerBound.sparseSet (d k : ℕ) :

Set (Fin d → ℝ)

The set of k-sparse vectors in ℝᵈ: B₀(k) = {θ ∈ ℝᵈ : |θ|₀ ≤ k}.

Instances For

source

theorem MinimaxLowerBound.hammingDist_le_two_weight {d k : ℕ} (ω₁ ω₂ : Fin d → Bool) (hw₁ : InfoTheory.l0norm_bool ω₁ = k) (hw₂ : InfoTheory.l0norm_bool ω₂ = k) :

InfoTheory.hammingDist ω₁ ω₂ ≤ 2 * k

source

theorem MinimaxLowerBound.sqDist_scaled_indicator {d M : ℕ} (ω : Fin M → Fin d → Bool) (scale : ℝ) (j k' : Fin M) :

(sqDist (fun (i : Fin d) => if ω j i = true then scale else 0) fun (i : Fin d) => if ω k' i = true then scale else 0) = scale ^ 2 * ↑(InfoTheory.hammingDist (ω j) (ω k'))

source

theorem MinimaxLowerBound.l0norm_scaled_indicator {d : ℕ} (ω : Fin d → Bool) (scale : ℝ) (hscale : scale ≠ 0) :

(l0norm fun (i : Fin d) => if ω i = true then scale else 0) = InfoTheory.l0norm_bool ω

source

theorem MinimaxLowerBound.sparse_estimation_lower_bound {d k : ℕ} (hk_pos : 1 ≤ k) (hk_le : 8 * k ≤ d) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 4) [hP_prob : ∀ (θ' : Fin d → ℝ), MeasureTheory.IsProbabilityMeasure (P θ')] (hac : ∀ (θ₁ θ₂ : Fin d → ℝ), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = ↑n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), InformationTheory.klDiv (P θ₁) (P θ₂) ≠ ⊤) :

∃ (C : ℝ), 0 < C ∧ minimaxRisk P (sparseSet d k) (C * σ ^ 2 * ↑k * Real.log (1 + ↑d / (2 * ↑k)) / ↑n) ≥ 1 / 2 - 2 * α

Sparse Estimation Lower Bound (Application of Theorem 5.11 + Sparse Varshamov-Gilbert).

For k-sparse vectors in ℝᵈ with 1 ≤ k ≤ d/8, there exist positive constants such that: inf_{θ̂} sup_{θ : |θ|₀ ≤ k} P_θ(|θ̂ − θ|₂² ≥ C·σ²k·log(1 + d/(2k))/n) ≥ 1/2 − 2α

Note: The textbook implicitly works in the GSM. The hypotheses hP_prob, hac, and hGSM_kl make these assumptions explicit, matching the approach in theorem_5_11.

Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Thm_5_11

Minimax risk: probability that squared error exceeds a threshold #

Application: Sparse Estimation Lower Bound #