Theorem 5.9: Two-Hypothesis Minimax Lower Bound #
Helper lemmas for GSM properties #
The GaussianSequenceModel structure encodes the GSM with its inherent
probabilistic properties (probability measure, absolute continuity, KL divergence
formula and finiteness) as structure fields. These lemmas extract the relevant
fields for use in the main proof.
P_θ is a probability measure for any θ in the GSM.
Gaussian measures with the same covariance are mutually absolutely continuous.
KL divergence for Gaussian measures: KL(P_{θ₁}‖P_{θ₀}) = n·|θ₁−θ₀|²/(2σ²).
(Example 5.7 in the textbook.)
This formula follows from explicit computation with Gaussian densities
and is encoded as a structure field hP_kl_toReal of GaussianSequenceModel.
KL divergence between Gaussian measures is finite.
This follows from the Gaussian KL formula (Example 5.7), which gives
a finite value, and is encoded as a structure field hP_kl_ne_top.
Squared distance: triangle-type inequality #
The key inequality used in the estimation-to-testing reduction:
for vectors a, b, c in ℝ^d, we have sqDist(a, b) ≤ 2·sqDist(c, a) + 2·sqDist(c, b),
which follows from the parallelogram law / (a-b)² ≤ 2(c-a)² + 2(c-b)².
Parallelogram-type inequality: sqDist(a, b) ≤ 2·sqDist(c, a) + 2·sqDist(c, b). Proof: For each coordinate, (aᵢ - bᵢ)² = ((aᵢ - cᵢ) + (cᵢ - bᵢ))² ≤ 2(aᵢ-cᵢ)² + 2(cᵢ-bᵢ)² by the AM-QM inequality (a+b)² ≤ 2a² + 2b².
Two-point testing reduction (Section 5.2 + Prop-Def 5.4) #
The reduction to testing from Section 5.2, specialized to M = 2. The textbook proof of Theorem 5.9 (displayed equation) uses the chain:
inf_θ̂ sup_θ P_θ(|θ̂−θ|² ≥ r) ≥ inf_ψ max_{j=0,1} P_j(ψ ≠ j) ≥ (1/2)(P₀(ψ=1) + P₁(ψ=0)) = (1/2)(1 − TV(P₀, P₁))
The first inequality follows from Section 5.2 (constructing the argmin test ψ from any estimator θ̂, using the triangle inequality), and the equality uses Proposition-Definition 5.4 (Neyman-Pearson).
Note on threshold: The corrected threshold φ = 2α²σ²/n satisfies the separation constraint |θ₀−θ₁|₂² = 8α²σ²/n = 4φ (Constraint 5.7).
Estimation-to-testing reduction (Section 5.2, specialized to M = 2).
For any estimator θ̂ and hypotheses θ₀, θ₁ with squared separation sqDist θ₀ θ₁ = 8α²σ²/n, the sum of error probabilities at threshold φ = 2α²σ²/n satisfies: P_{θ₀}(sqDist(θ̂, θ₀) ≥ φ) + P_{θ₁}(sqDist(θ̂, θ₁) ≥ φ) ≥ 1 - TV(P₀, P₁).
Proof: Define ψ(Y) = decide(sqDist(θ̂(Y), θ₀) ≥ φ). Then: {ψ = true} = {sqDist(θ̂, θ₀) ≥ φ} {ψ = false} = {sqDist(θ̂, θ₀) < φ}
By the triangle inequality, if sqDist(θ̂, θ₀) < φ, then sqDist(θ₀,θ₁) ≤ 2·sqDist(θ̂, θ₀) + 2·sqDist(θ̂, θ₁), giving sqDist(θ̂, θ₁) ≥ sqDist(θ₀,θ₁)/2 − sqDist(θ̂, θ₀) > 4α²σ²/n − φ = φ. Hence {ψ = false} ⊆ {sqDist(θ̂, θ₁) > φ} ⊆ {sqDist(θ̂, θ₁) ≥ φ}.
By Neyman-Pearson (Lemma 5.3): P₀(ψ=true) + P₁(ψ=false) ≥ 1 - TV(P₀, P₁)
Since P₀(ψ=true) = P₀({sqDist(θ̂,θ₀) ≥ φ}) (first term of the conclusion) and P₁(ψ=false) ≤ P₁({sqDist(θ̂,θ₁) ≥ φ}) (second term of the conclusion), we conclude.
Helper: the biSup defining supProbLargeError is bounded above by 1.
Helper: the inner biSup is bounded above (trivially, it's over a Prop).
Theorem 5.9 (Rigollet, "High Dimensional Statistics", Chapter 5.3). Two-hypothesis minimax lower bound for the Gaussian Sequence Model.
Book statement (verbatim): "Assume that Θ contains two hypotheses θ₀ and θ₁ such that |θ₀ − θ₁|₂² = 8α²σ²/n for some α ∈ (0, 1/2). Then inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ 2ασ²/n) ≥ 1/2 − α."
Erratum (typo correction: threshold 2α → 2α²): The book's threshold
2ασ²/n is a typo — it should be 2α²σ²/n. The book's stated threshold is
inconsistent with the separation: the testing reduction from Section 5.2
requires separation ≥ 4·threshold, i.e. 8α² ≥ 8α, i.e. α ≥ 1, contradicting
α < 1/2. See theorem_5_9_verbatim_is_false for a formal proof that the
book's exact triple is inconsistent. With the corrected threshold 2α²σ²/n,
we have 8α² = 4 × 2α² as required, and the book's proof chain
(1/2)(1 − TV) ≥ (1/2)(1 − 2α) = 1/2 − α is exactly followed.
Proof (Rigollet, Theorem 5.9, displayed equation):
Step 1 (Testing reduction + Neyman-Pearson, Section 5.2 + Prop-Def 5.4): minimaxProbLargeError gsm (2α²σ²/n) ≥ (1/2)(1 − TV(P₀, P₁))
Step 2 (Pinsker's inequality, Lemma 5.8): TV(P₀, P₁) ≤ √(KL(P₁‖P₀))
Step 3 (Gaussian KL, Example 5.7): KL(P₁‖P₀) = n·|θ₁−θ₀|²/(2σ²) = n·8α²σ²/(2σ²·n) = 4α²
Step 4 (Algebraic): minimaxProbLargeError ≥ (1/2)(1 − √(4α²)) = (1/2)(1 − 2α) = 1/2 − α.
Theorem 5.9 (Rigollet, Section 5.3, rescaled formulation via α ↦ √α).
Equivalent formulation of Theorem 5.9 obtained from
Minimax.minimaxLowerBound_gaussianSequence by substituting α ↦ √α.
This version uses the book's threshold 2ασ²/n at the cost of changing
the separation to 8ασ²/n and the bound to 1/2 - √α.
Alias for Minimax.minimaxLowerBound_gaussianSequence (Theorem 5.9, corrected).
Formal proof that the book's verbatim statement is inconsistent.
The book's Theorem 5.9 uses separation 8α²σ²/n and threshold 2ασ²/n. The testing reduction (Section 5.2) requires separation ≥ 4·threshold, i.e. 8α² ≥ 8α, i.e. α ≥ 1, which contradicts α ∈ (0, 1/2).
Equivalently, a midpoint estimator θ̂ = (θ₀ + θ₁)/2 achieves squared error 2α²σ²/n per hypothesis, which is strictly less than the threshold 2ασ²/n for α < 1. So the error event never occurs and the minimax probability is 0, not ≥ 1/2 − α > 0 as claimed.
This lemma proves the core algebraic facts: for α ∈ (0, 1/2), (1) the separation 8α² is strictly less than 4 times the threshold 2α (so the testing reduction's precondition fails), and (2) the midpoint error 2α² is strictly less than the threshold 2α (so a concrete estimator makes the error event empty).