Minimax risk: probability that squared error exceeds a threshold #
The minimax risk inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ) is defined
as an infimum over estimators and supremum over parameters.
An estimator: a measurable function from observations to parameter estimates.
Instances For
Minimax testing risk over a parameter set Θ ⊆ ℝᵈ with respect to a
measure family P:
minimaxRisk P Θ ϕ = inf_{θ̂ measurable} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ).
The outer infimum is over all measurable estimators θ̂ : ℝᵈ → ℝᵈ and the inner supremum is over all parameters θ ∈ Θ. The value P_θ(|θ̂ − θ|₂² ≥ ϕ) is the probability (under P_θ) that the squared error exceeds the threshold ϕ.
Instances For
The local sqDist equals InfoTheory.sqDist (definitionally equal).
Key algebraic lemma: for M ≥ 5 and α > 0, (α · log M + log 2) / log (M - 1) ≤ 2α + 1/2.
This follows from:
- log M ≤ 2 · log(M-1) for M ≥ 5 (since M ≤ (M-1)²)
- log 2 ≤ (1/2) · log(M-1) for M ≥ 5 (since M-1 ≥ 4 = 2²)
Theorem 5.11 (Many-hypothesis minimax lower bound via Fano).
Assume that Θ contains M ≥ 5 hypotheses θ₁, …, θ_M such that for some constant 0 < α < 1/4: (i) sqDist(θⱼ, θₖ) ≥ 4ϕ for all j ≠ k (separation condition) (ii) sqDist(θⱼ, θₖ) ≤ (2ασ²/n)·log(M) for all j ≠ k (KL control)
Then: inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ) ≥ 1/2 − 2α
The proof applies Fano's inequality (Theorem 5.10) after bounding the average pairwise KL divergence using condition (ii) and the Gaussian KL formula KL(P_j ∥ P_k) = n·|θⱼ − θₖ|₂²/(2σ²).
Note: The textbook implicitly assumes the Gaussian sequence model (GSM),
where each P_θ is a probability measure, measures are mutually absolutely
continuous, and KL(P_θ₁ ∥ P_θ₂) = n|θ₁−θ₂|²/(2σ²). These standard
statistical assumptions are made explicit as hypotheses hprob, hac,
and hGSM (following the axiom structure in InfoTheory.lean).
Application: Sparse Estimation Lower Bound #
Using the Sparse Varshamov-Gilbert Lemma (Lemma 5.14) together with Theorem 5.11, we obtain the minimax lower bound for estimating k-sparse vectors in the Gaussian sequence model.
Sparse Estimation Lower Bound (Application of Theorem 5.11 + Sparse Varshamov-Gilbert).
For k-sparse vectors in ℝᵈ with 1 ≤ k ≤ d/8, there exist positive constants such that: inf_{θ̂} sup_{θ : |θ|₀ ≤ k} P_θ(|θ̂ − θ|₂² ≥ C·σ²k·log(1 + d/(2k))/n) ≥ 1/2 − 2α
Note: The textbook implicitly works in the GSM. The hypotheses hP_prob, hac,
and hGSM_kl make these assumptions explicit, matching the approach in theorem_5_11.