Atlas.HighDimensionalStatistics.code.Chapter5.Thm_5

Theorem 5.9: Two-Hypothesis Minimax Lower Bound #

Helper lemmas for GSM properties #

The GaussianSequenceModel structure encodes the GSM with its inherent probabilistic properties (probability measure, absolute continuity, KL divergence formula and finiteness) as structure fields. These lemmas extract the relevant fields for use in the main proof.

source

theorem gsm_isProbabilityMeasure (gsm : Minimax.GaussianSequenceModel) (θ : Fin gsm.d → ℝ) :

MeasureTheory.IsProbabilityMeasure (gsm.P θ)

P_θ is a probability measure for any θ in the GSM.

source

theorem gsm_absolutelyContinuous (gsm : Minimax.GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) :

(gsm.P θ₀).AbsolutelyContinuous (gsm.P θ₁)

Gaussian measures with the same covariance are mutually absolutely continuous.

source

theorem gsm_klDiv_toReal (gsm : Minimax.GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) :

(InformationTheory.klDiv (gsm.P θ₁) (gsm.P θ₀)).toReal = ↑gsm.n * Minimax.sqDist θ₁ θ₀ / (2 * gsm.σ ^ 2)

KL divergence for Gaussian measures: KL(P_{θ₁}‖P_{θ₀}) = n·|θ₁−θ₀|²/(2σ²). (Example 5.7 in the textbook.) This formula follows from explicit computation with Gaussian densities and is encoded as a structure field hP_kl_toReal of GaussianSequenceModel.

source

theorem gsm_klDiv_ne_top (gsm : Minimax.GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) :

InformationTheory.klDiv (gsm.P θ₁) (gsm.P θ₀) ≠ ⊤

KL divergence between Gaussian measures is finite. This follows from the Gaussian KL formula (Example 5.7), which gives a finite value, and is encoded as a structure field hP_kl_ne_top.

Squared distance: triangle-type inequality #

The key inequality used in the estimation-to-testing reduction: for vectors a, b, c in ℝ^d, we have sqDist(a, b) ≤ 2·sqDist(c, a) + 2·sqDist(c, b), which follows from the parallelogram law / (a-b)² ≤ 2(c-a)² + 2(c-b)².

source

theorem sqDist_triangle_le {d : ℕ} (a b c : Fin d → ℝ) :

Minimax.sqDist a b ≤ 2 * Minimax.sqDist c a + 2 * Minimax.sqDist c b

Parallelogram-type inequality: sqDist(a, b) ≤ 2·sqDist(c, a) + 2·sqDist(c, b). Proof: For each coordinate, (aᵢ - bᵢ)² = ((aᵢ - cᵢ) + (cᵢ - bᵢ))² ≤ 2(aᵢ-cᵢ)² + 2(cᵢ-bᵢ)² by the AM-QM inequality (a+b)² ≤ 2a² + 2b².

Two-point testing reduction (Section 5.2 + Prop-Def 5.4) #

The reduction to testing from Section 5.2, specialized to M = 2. The textbook proof of Theorem 5.9 (displayed equation) uses the chain:

inf_θ̂ sup_θ P_θ(|θ̂−θ|² ≥ r) ≥ inf_ψ max_{j=0,1} P_j(ψ ≠ j) ≥ (1/2)(P₀(ψ=1) + P₁(ψ=0)) = (1/2)(1 − TV(P₀, P₁))

The first inequality follows from Section 5.2 (constructing the argmin test ψ from any estimator θ̂, using the triangle inequality), and the equality uses Proposition-Definition 5.4 (Neyman-Pearson).

Note on threshold: The corrected threshold φ = 2α²σ²/n satisfies the separation constraint |θ₀−θ₁|₂² = 8α²σ²/n = 4φ (Constraint 5.7).

source

theorem estimation_to_testing_sum_bound (gsm : Minimax.GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) (α : ℝ) (_hα_pos : 0 < α) (_hα_lt : α < 1 / 2) (hDist : Minimax.sqDist θ₀ θ₁ = 8 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) [MeasureTheory.IsProbabilityMeasure (gsm.P θ₀)] [MeasureTheory.IsProbabilityMeasure (gsm.P θ₁)] (θhat : Minimax.Estimator gsm.d) :

((gsm.P θ₀) {Y : Fin gsm.d → ℝ | Minimax.sqDist (θhat Y) θ₀ ≥ 2 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n}).toReal + ((gsm.P θ₁) {Y : Fin gsm.d → ℝ | Minimax.sqDist (θhat Y) θ₁ ≥ 2 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n}).toReal ≥ 1 - Chapter5.TVNP.tvDist (gsm.P θ₀) (gsm.P θ₁)

Estimation-to-testing reduction (Section 5.2, specialized to M = 2).

For any estimator θ̂ and hypotheses θ₀, θ₁ with squared separation sqDist θ₀ θ₁ = 8α²σ²/n, the sum of error probabilities at threshold φ = 2α²σ²/n satisfies: P_{θ₀}(sqDist(θ̂, θ₀) ≥ φ) + P_{θ₁}(sqDist(θ̂, θ₁) ≥ φ) ≥ 1 - TV(P₀, P₁).

Proof: Define ψ(Y) = decide(sqDist(θ̂(Y), θ₀) ≥ φ). Then: {ψ = true} = {sqDist(θ̂, θ₀) ≥ φ} {ψ = false} = {sqDist(θ̂, θ₀) < φ}

By the triangle inequality, if sqDist(θ̂, θ₀) < φ, then sqDist(θ₀,θ₁) ≤ 2·sqDist(θ̂, θ₀) + 2·sqDist(θ̂, θ₁), giving sqDist(θ̂, θ₁) ≥ sqDist(θ₀,θ₁)/2 − sqDist(θ̂, θ₀) > 4α²σ²/n − φ = φ. Hence {ψ = false} ⊆ {sqDist(θ̂, θ₁) > φ} ⊆ {sqDist(θ̂, θ₁) ≥ φ}.

By Neyman-Pearson (Lemma 5.3): P₀(ψ=true) + P₁(ψ=false) ≥ 1 - TV(P₀, P₁)

Since P₀(ψ=true) = P₀({sqDist(θ̂,θ₀) ≥ φ}) (first term of the conclusion) and P₁(ψ=false) ≤ P₁({sqDist(θ̂,θ₁) ≥ φ}) (second term of the conclusion), we conclude.

source

theorem supProbLargeError_bddAbove_outer (gsm : Minimax.GaussianSequenceModel) (θhat : Minimax.Estimator gsm.d) (φ : ℝ) :

BddAbove (Set.range fun (θ : Fin gsm.d → ℝ) => ⨆ (_ : θ ∈ gsm.Θ), ((gsm.P θ) {Y : Fin gsm.d → ℝ | Minimax.sqDist (θhat Y) θ ≥ φ}).toReal)

Helper: the biSup defining supProbLargeError is bounded above by 1.

source

theorem supProbLargeError_bddAbove_inner (gsm : Minimax.GaussianSequenceModel) (θhat : Minimax.Estimator gsm.d) (φ : ℝ) (θ₀ : Fin gsm.d → ℝ) :

BddAbove (Set.range fun (x : θ₀ ∈ gsm.Θ) => ((gsm.P θ₀) {Y : Fin gsm.d → ℝ | Minimax.sqDist (θhat Y) θ₀ ≥ φ}).toReal)

Helper: the inner biSup is bounded above (trivially, it's over a Prop).

source

theorem two_point_testing_reduction (gsm : Minimax.GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) (hΘ₀ : θ₀ ∈ gsm.Θ) (hΘ₁ : θ₁ ∈ gsm.Θ) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 2) (hDist : Minimax.sqDist θ₀ θ₁ = 8 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) [MeasureTheory.IsProbabilityMeasure (gsm.P θ₀)] [MeasureTheory.IsProbabilityMeasure (gsm.P θ₁)] :

Minimax.minimaxProbLargeError gsm (2 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) ≥ 1 / 2 * (1 - Chapter5.TVNP.tvDist (gsm.P θ₀) (gsm.P θ₁))

source

theorem Minimax.minimaxLowerBound_gaussianSequence (gsm : GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) (hΘ₀ : θ₀ ∈ gsm.Θ) (hΘ₁ : θ₁ ∈ gsm.Θ) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 2) (hDist : sqDist θ₀ θ₁ = 8 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) :

minimaxProbLargeError gsm (2 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) ≥ 1 / 2 - α

Theorem 5.9 (Rigollet, "High Dimensional Statistics", Chapter 5.3). Two-hypothesis minimax lower bound for the Gaussian Sequence Model.

Book statement (verbatim): "Assume that Θ contains two hypotheses θ₀ and θ₁ such that |θ₀ − θ₁|₂² = 8α²σ²/n for some α ∈ (0, 1/2). Then inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ 2ασ²/n) ≥ 1/2 − α."

Erratum (typo correction: threshold 2α → 2α²): The book's threshold 2ασ²/n is a typo — it should be 2α²σ²/n. The book's stated threshold is inconsistent with the separation: the testing reduction from Section 5.2 requires separation ≥ 4·threshold, i.e. 8α² ≥ 8α, i.e. α ≥ 1, contradicting α < 1/2. See theorem_5_9_verbatim_is_false for a formal proof that the book's exact triple is inconsistent. With the corrected threshold 2α²σ²/n, we have 8α² = 4 × 2α² as required, and the book's proof chain (1/2)(1 − TV) ≥ (1/2)(1 − 2α) = 1/2 − α is exactly followed.

Proof (Rigollet, Theorem 5.9, displayed equation):

Step 1 (Testing reduction + Neyman-Pearson, Section 5.2 + Prop-Def 5.4): minimaxProbLargeError gsm (2α²σ²/n) ≥ (1/2)(1 − TV(P₀, P₁))

Step 2 (Pinsker's inequality, Lemma 5.8): TV(P₀, P₁) ≤ √(KL(P₁‖P₀))

Step 3 (Gaussian KL, Example 5.7): KL(P₁‖P₀) = n·|θ₁−θ₀|²/(2σ²) = n·8α²σ²/(2σ²·n) = 4α²

Step 4 (Algebraic): minimaxProbLargeError ≥ (1/2)(1 − √(4α²)) = (1/2)(1 − 2α) = 1/2 − α.

source

theorem Minimax.theorem_5_9_rescaled (gsm : GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) (hΘ₀ : θ₀ ∈ gsm.Θ) (hΘ₁ : θ₁ ∈ gsm.Θ) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 4) (hDist : sqDist θ₀ θ₁ = 8 * α * gsm.σ ^ 2 / ↑gsm.n) :

minimaxProbLargeError gsm (2 * α * gsm.σ ^ 2 / ↑gsm.n) ≥ 1 / 2 - √α

Theorem 5.9 (Rigollet, Section 5.3, rescaled formulation via α ↦ √α).

Equivalent formulation of Theorem 5.9 obtained from Minimax.minimaxLowerBound_gaussianSequence by substituting α ↦ √α. This version uses the book's threshold 2ασ²/n at the cost of changing the separation to 8ασ²/n and the bound to 1/2 - √α.

source

theorem Minimax.theorem_5_9 (gsm : GaussianSequenceModel) (θ₀ θ₁ : Fin gsm.d → ℝ) (hΘ₀ : θ₀ ∈ gsm.Θ) (hΘ₁ : θ₁ ∈ gsm.Θ) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 2) (hDist : sqDist θ₀ θ₁ = 8 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) :

minimaxProbLargeError gsm (2 * α ^ 2 * gsm.σ ^ 2 / ↑gsm.n) ≥ 1 / 2 - α

Alias for Minimax.minimaxLowerBound_gaussianSequence (Theorem 5.9, corrected).

source

theorem theorem_5_9_verbatim_is_false (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1 / 2) :

8 * α ^ 2 < 4 * (2 * α) ∧ 2 * α ^ 2 < 2 * α

Formal proof that the book's verbatim statement is inconsistent.

The book's Theorem 5.9 uses separation 8α²σ²/n and threshold 2ασ²/n. The testing reduction (Section 5.2) requires separation ≥ 4·threshold, i.e. 8α² ≥ 8α, i.e. α ≥ 1, which contradicts α ∈ (0, 1/2).

Equivalently, a midpoint estimator θ̂ = (θ₀ + θ₁)/2 achieves squared error 2α²σ²/n per hypothesis, which is strictly less than the threshold 2ασ²/n for α < 1. So the error event never occurs and the minimax probability is 0, not ≥ 1/2 − α > 0 as claimed.

This lemma proves the core algebraic facts: for α ∈ (0, 1/2), (1) the separation 8α² is strictly less than 4 times the threshold 2α (so the testing reduction's precondition fails), and (2) the midpoint error 2α² is strictly less than the threshold 2α (so a concrete estimator makes the error event empty).

Documentation