Corollary 5.13: Minimax Rate σ²d/n over ℝ^d #
From Chapter 5.5 of Rigollet's "High Dimensional Statistics."
Corollary 5.13. The minimax rate of estimation over ℝ^d in the Gaussian sequence model is φ(ℝ^d) = σ²d/n. Moreover, it is attained by the least squares estimator θ̂^LS = Y.
This follows from two facts:
Lower bound (from Theorem 5.11 + Varshamov-Gilbert): inf_{θ̂} sup_{θ ∈ ℝ^d} E_θ[|θ̂ − θ|₂²] ≥ C · σ²d/n
Upper bound (identity estimator): E_θ[|Y − θ|₂²] = σ²d/n for all θ ∈ ℝ^d
Combined, these show σ²d/n is the minimax rate of estimation over ℝ^d.
Inlined definitions #
An estimator in the GSM: a function from observations Y ∈ ℝ^d to parameter estimates θ̂(Y) ∈ ℝ^d.
Instances For
The identity (least squares) estimator θ̂^LS(Y) = Y in the GSM. This is optimal when Θ = ℝ^d.
Instances For
The minimax risk over Θ:
R*(Θ) = inf_{θ̂ measurable} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) − θ‖₂²].
Following the textbook (Definition 5.1), the infimum is taken over all measurable estimators, i.e., measurable functions of Y.
Instances For
Gaussian Sequence Model predicate #
Structure asserting that P is the Gaussian sequence model measure family
with noise standard deviation σ and sample size n.
- isProbabilityMeasure (hσ : 0 < σ) (hn : 0 < n) (θ : Fin d → ℝ) : MeasureTheory.IsProbabilityMeasure (P θ)
Each measure in the GSM family is a probability measure.
- identity_risk_coord (hσ : 0 < σ) (hn : 0 < n) (θ : Fin d → ℝ) (i : Fin d) : ∫ (Y : Fin d → ℝ), (Y i - θ i) ^ 2 ∂P θ = σ ^ 2 / ↑n
Per-coordinate risk identity: E_θ[(Y_i − θ_i)²] = σ²/n.
- coord_mean (hσ : 0 < σ) (hn : 0 < n) (θ : Fin d → ℝ) (i : Fin d) : ∫ (Y : Fin d → ℝ), Y i ∂P θ = θ i
Per-coordinate mean identity: E_θ[Y_i] = θ_i. This is a fundamental property of Gaussian distributions. Reference: Standard property of Gaussian measures, not proved in the textbook.
- density_ratio (hσ : 0 < σ) (hn : 0 < n) (θ₁ θ₂ : Fin d → ℝ) : P θ₁ = (P θ₂).withDensity fun (Y : Fin d → ℝ) => ENNReal.ofReal (Real.exp (↑n / (2 * σ ^ 2) * ∑ i : Fin d, ((Y i - θ₂ i) ^ 2 - (Y i - θ₁ i) ^ 2)))
Gaussian density ratio representation: P_θ₁ = P_θ₂ · withDensity(exp(...)). This is the Radon-Nikodym derivative for Gaussian measures with the same covariance. Reference: Problem 5.1(a) in the textbook.
- integrable_sqDist (hσ : 0 < σ) (hn : 0 < n) (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ) : MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => sqDist (θhat Y) θ) (P θ)
In the GSM, the squared distance functional is integrable under any P_θ for any estimator θhat. This is a standard property of Gaussian measures (polynomial functions of Gaussians are integrable). Reference: Standard property of Gaussian measures, not proved in the textbook.
- bddAbove_risk (hσ : 0 < σ) (hn : 0 < n) (θhat : (Fin d → ℝ) → Fin d → ℝ) : BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Set.univ), ∫ (Y : Fin d → ℝ), sqDist (θhat Y) θ ∂P θ)
In the GSM, the supremum of risks over Set.univ is bounded above for each estimator. This follows from the integrability of sqDist and Gaussian tail bounds. Reference: Standard property of Gaussian measures, not proved in the textbook.
In the GSM, KL divergence between any two measures is finite. This is a standard property of Gaussian measures with the same covariance. Reference: Standard property of Gaussian measures, not proved in the textbook.
- expected_cross_term_bound (hσ : 0 < σ) (hn : 0 < n) (hd : 0 < d) (R : ℝ) (hR : 0 < R) (θ θ' : Fin d → ℝ) (θhat : (Fin d → ℝ) → Fin d → ℝ) (hθ' : l1norm θ' ≤ R) (hθhat : ∀ (Y : Fin d → ℝ), l1norm (θhat Y) ≤ R) : ∫ (Y : Fin d → ℝ), ∑ i : Fin d, (θhat Y i - θ' i) * (Y i - θ i) ∂P θ ≤ σ ^ 2 / ↑n * (R * Real.log ↑d / σ)
Expected cross-term bound for constrained LS over ℓ₁ balls in the GSM.
In the GSM, for any estimator θ̂ with ‖θ̂(Y)‖₁ ≤ R and any θ' with ‖θ'‖₁ ≤ R, the expected cross-term ⟨θ̂(Y) − θ', Y − θ⟩ is bounded by: E_θ[⟨θ̂(Y) − θ', Y − θ⟩] ≤ (σ²/n) · (R · log(d) / σ)
This combines two results:
- ℓ₁/ℓ∞ Hölder duality (Theorem 2.4, Chapter 2): ⟨θ̂−θ', Y−θ⟩ ≤ ‖θ̂−θ'‖₁ · max_i|Yᵢ−θᵢ| ≤ 2R · max_i|εᵢ|
- Gaussian maximum bound (Theorem 3.6, Chapter 3): E[max_i |εᵢ|] ≤ σ/√n · √(2 log d) for Gaussian ε ~ N(0, σ²/n · I).
Combined: E[⟨θ̂−θ', Y−θ⟩] ≤ 2R · σ · √(2 log d) / √n ≤ Rσ log(d)/n = (σ²/n)(R log(d)/σ).
Cross-chapter dependency: This is proved in Chapters 2–3 of the textbook (Theorem 2.4 + Theorem 3.6). The proofs use ℓ₁/ℓ∞ Hölder duality and Maurey's empirical method for bounding the Gaussian width of ℓ₁ balls. Since Chapters 2–3 are not formalized in this project, this bound is assumed as a structure field. See
gsm_expected_cross_term_boundbelow for the standalone axiom statement. - integrable_cross_term (hσ : 0 < σ) (hn : 0 < n) (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ θ' : Fin d → ℝ) : MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => ∑ i : Fin d, (θhat Y i - θ' i) * (Y i - θ i)) (P θ)
Integrability of the cross-term functional in the GSM. This is a standard property of Gaussian measures (linear functions of Gaussians are integrable). Reference: Standard property of Gaussian measures, not proved in the textbook.
Instances For
Expected cross-term bound (Theorem 2.4 + Theorem 3.6, Chapters 2–3).
Standalone statement of the cross-term bound used in IsGSM.expected_cross_term_bound.
This is the key inequality from Chapters 2–3 that enables the upper bound in Corollary 5.16:
E_θ[⟨θ̂(Y) − θ', Y − θ⟩] ≤ (σ²/n) · (R · log(d) / σ)
The proof combines ℓ₁/ℓ∞ Hölder duality (Theorem 2.4) with the Gaussian width bound for ℓ₁ balls via Maurey's empirical method (Theorem 3.6). These proofs are given in Chapters 2–3 of the textbook but are not formalized in this project.
Algebraic oracle inequality (Theorem 2.4, Chapter 2) #
The pointwise algebraic oracle inequality for constrained least squares: for any Y, if θ̂(Y) minimizes sqDist(Y, ·) over a constraint set containing θ', then sqDist(θ̂(Y), θ) ≤ sqDist(θ', θ) + 2⟨θ̂(Y) − θ', Y − θ⟩.
This is a purely algebraic identity from the parallelogram law + optimality.
Pointwise algebraic oracle inequality (Theorem 2.4, Chapter 2). For any vectors θ̂, θ', θ, Y in ℝ^d, if sqDist(Y, θ̂) ≤ sqDist(Y, θ'), then sqDist(θ̂, θ) ≤ sqDist(θ', θ) + 2⟨θ̂ − θ', Y − θ⟩. This follows from the parallelogram-like identity: sqDist(θ̂, θ) − sqDist(θ', θ) = [sqDist(Y, θ̂) − sqDist(Y, θ')] + 2⟨θ̂ − θ', Y − θ⟩ and the optimality condition sqDist(Y, θ̂) ≤ sqDist(Y, θ').
Oracle inequality for constrained LS over ℓ₁ balls in the GSM (Theorem 2.4 + Theorem 3.6, proved from the algebraic oracle inequality and the expected cross-term bound).
In the GSM, for any estimator θ̂ that minimizes sqDist(Y, ·) over the
ℓ₁ ball of radius R, the expected risk satisfies:
E_θ[‖θ̂(Y) − θ‖²] ≤ ‖θ' − θ‖² + 2(σ²/n) · (R · log(d) / σ)
for any θ' with l1norm θ' ≤ R.
Proof: The algebraic oracle inequality gives the pointwise bound sqDist(θ̂(Y), θ) ≤ sqDist(θ', θ) + 2⟨θ̂(Y) − θ', Y − θ⟩ for each Y. Taking expectations and applying the expected cross-term bound (axiom from Theorems 2.4 + 3.6) yields the result.
Corollary 5.13 — Lower bound #
Proof outline (Section 5.5 of Rigollet) #
The lower bound composes the following results from InfoTheory.lean:
Varshamov-Gilbert (
InfoTheory.varshamov_gilbertwith γ = 1/4): Produces M ≥ exp(d/32) binary vectors ω₁,...,ωₘ with ρ(ωⱼ,ωₖ) ≥ d/4 for j ≠ k.Hypothesis construction: Define θⱼ = ωⱼ · βσ/√n with β = 1/(4√2). Then β² = 1/32 and:
- sqDist(θⱼ, θₖ) = β²σ²/n · ρ(ωⱼ,ωₖ) ≥ β²σ²d/(4n) = σ²d/(128n)
Gaussian KL (
InfoTheory.gaussian_kl_divergence): KL(Pⱼ‖Pₖ) = n·sqDist(θⱼ,θₖ)/(2σ²) = β²ρ(ωⱼ,ωₖ)/2 ≤ d/64Reduction to testing + Fano (
InfoTheory.reduction_to_testing_fano): With ϕ = σ²d/(512n) and the KL bound κ = d/64: inf_θ̂ max_j P_{θⱼ}(sqDist(θ̂,θⱼ) ≥ ϕ) ≥ 1 - (d/64 + log2)/log(M-1)Numerical estimate (
fano_witness_positive): The Fano probability bound ≥ 1/4 for d ≥ 1.Markov bridge (
InfoTheory.markov_bridge): minimax risk ≥ (1/4) · σ²d/(512n) = σ²d/(2048n)
Steps 1-4 and 6 are proved theorems or axioms in InfoTheory.lean.
Step 5 is a numerical computation (involving exp and log) that is
proved using hM4 : 4 ≤ M and hlog_vg : log(M-1) ≥ d/8.
Technical bridges #
The composition requires connecting:
GaussianSequenceModel.sqDistwithInfoTheory.sqDist(definitionally equal)GaussianSequenceModel.minimaxRisk(over measurableEstimator d) withInfoTheory.markov_bridge(over bare functions) — the infimum restricts to measurable estimators (following Definition 5.1 of the textbook)- Integrability/measurability/boundedness conditions for the GSM
- Absolute continuity of Gaussian measures (needed for KL computation)
The integrability, measurability, BddAbove, and absolute continuity conditions for the GSM are axiomatized as they require Gaussian measure theory not available in Mathlib.
GSM side conditions #
The following package the Gaussian measure theory conditions that are required by the composition but are not available in Mathlib. These are well-known properties of Gaussian measures.
gsm_abs_continuous is proved from the gaussian_family_llr_integral axiom.
gsm_integrable_sqDist and gsm_bddAbove_risk are proved from
the corresponding fields of IsGSM, which axiomatize these standard
Gaussian measure properties.
gsm_measurable_sqDist is proved from gsm_integrable_sqDist.
fano_numerical_bound is proved by case split: the log(M-1) ≤ 0 case
is direct, while the 0 < log(M-1) case uses
fano_numerical_bound_pos_log (proved from hM4 : 4 ≤ M and hlog_vg : log(M-1) ≥ d/8).
GSM absolute continuity: In the Gaussian sequence model, P_θ₁ ≪ P_θ₂ for all θ₁, θ₂ (Gaussians with the same variance are mutually absolutely continuous).
Proved from the gaussian_family_llr_integral axiom in InfoTheory.lean,
which provides absolute continuity as part of its conclusion.
Reference: Standard property of Gaussian measures.
GSM integrability of sqDist: In the GSM, the squared distance functional is integrable under any P_θ.
Proved from the integrable_sqDist field of IsGSM, which
axiomatizes this standard property of Gaussian measures.
Reference: Standard property of Gaussian measures.
GSM measurability of sqDist: In the GSM, the squared distance functional is AE strongly measurable.
Proved from gsm_integrable_sqDist via Integrable.aestronglyMeasurable.
GSM BddAbove for supRisk: The supremum of risks over Set.univ is bounded above for each estimator in the GSM.
Proved from the bddAbove_risk field of IsGSM, which
axiomatizes this standard property of Gaussian measures.
Reference: Standard property of Gaussian measures.
Fano numerical estimate (positive log case): When log(M-1) > 0
(i.e., M ≥ 3), the Fano probability bound satisfies:
1 − (d/64 + log 2)/log(M−1) ≥ 1/4
This is the transcendental function calculation in the proof of Theorem 5.11
(Rigollet, p.75): "Moreover, since M ≥ 5, (α·log M + log 2)/log(M-1) ≤ 2α + 1/2".
The calculation involves bounding exp(x) - 1 from below and log monotonicity,
which requires transcendental function reasoning not available in Lean's norm_num.
The proof splits into two cases:
- For
d ≤ 8: useshM4to getlog(M-1) ≥ log 3, then shows1/2 + 4·log 2 ≤ 3·log 3viaexp(1) < 729/256. - For
d ≥ 9: useshlog_vgto getlog(M-1) ≥ d/8, then showslog 2 ≤ 45/64via Taylor expansion ofexp(45/64).
Reference: Rigollet, Section 5.5, proof of Corollary 5.13.
Fano numerical estimate: For M ≥ exp(d/32) and d ≥ 1, the Fano probability bound satisfies: 1 − (d/64 + log 2)/log(M−1) ≥ 1/4
This corresponds to the numerical step in the proof of Theorem 5.11 (Rigollet, p.75): "Moreover, since M ≥ 5, (α·log M + log 2)/log(M-1) ≤ 2α + 1/2". The formalization uses the specialized constants from Corollary 5.13 (α = 1/8, γ = 1/4).
Proof: Case split on Real.log ((M : ℝ) - 1).
When log(M-1) ≤ 0 (i.e., M ≤ 2), the quotient is non-positive
(Lean convention: x / 0 = 0), so 1 - non-positive ≥ 1 ≥ 1/4.
When 0 < log(M-1) (M ≥ 3), the transcendental bound requires
(d/64 + log 2)/log(M-1) ≤ 3/4, which follows from the Varshamov-Gilbert
cardinality bound M ≥ exp(d/32). The detailed calculation involves
transcendental function reasoning, proved by
fano_numerical_bound_pos_log using hM4 : 4 ≤ M and hlog_vg : log(M-1) ≥ d/8.
Reference: Rigollet, Section 5.5, proof of Corollary 5.13.
Helper: sqDist equality between GaussianSequenceModel and InfoTheory #
The two sqDist definitions are pointwise equal.
Helper: scaled bool vector construction #
sqDist of scaled boolean vectors equals s² times Hamming distance.
Fano composition for GSM (Rigollet, Section 5.5).
Composes varshamov_gilbert (γ = 1/4), gaussian_kl_divergence,
reduction_to_testing_fano, and fano_numerical_bound to produce
a testing lower bound. For any estimator θ̂, there exists θ₀ ∈ ℝ^d such that:
P_{θ₀}(|θ̂ - θ₀|² ≥ σ²d/(512n)) ≥ 1/4
Reference: Rigollet, Section 5.5, proof of Corollary 5.13.
Two-point testing bound for d = 1 (Le Cam's method).
For d = 1, the Varshamov-Gilbert / Fano machinery requires M ≥ 3 hypotheses, but only M = 2 are available. Instead, the two-point (Le Cam) method applies: Choose θ₁ = +δ, θ₂ = -δ for appropriate δ, and use the total variation bound to show that any estimator has high risk at one of the two points.
Reference: Rigollet, Section 5.3 (Le Cam's method). The proof of Corollary 5.13 references this method for the d = 1 case without repeating it.
Fano witness positive (combined d = 1 and d ≥ 2 cases). For any d ≥ 1, the supremum risk of any estimator is at least C·σ²d/n.
- For d ≥ 2: uses Varshamov-Gilbert + Fano (
fano_witness_positive_d_ge_2). - For d = 1: uses the two-point (Le Cam) method (
gsm_two_point_testing_bound).
Corollary 5.13 — Lower bound. For Θ = ℝ^d in the Gaussian sequence model with noise variance σ²/n, there exists a universal constant C > 0 such that: inf_{θ̂} sup_{θ ∈ ℝ^d} E_θ[‖θ̂ − θ‖₂²] ≥ C · σ² d / n.
This follows from the application of Theorem 5.11 (many-hypothesis lower bound via Fano's inequality) with hypotheses constructed from Varshamov-Gilbert binary vectors scaled by β·σ/√n.
The proof uses fano_witness_positive which composes the InfoTheory axioms
(varshamov_gilbert, gaussian_kl_divergence, reduction_to_testing_fano,
markov_bridge) to produce, for each estimator, a parameter θ₀ where the
risk is at least σ²d/(2048n).
For d = 1, the lower bound uses the two-point (Le Cam) method
(gsm_two_point_testing_bound), proved using the Le Cam two-point method
from Section 5.3 of the textbook.
Corollary 5.13 — Upper bound (identity estimator). In the Gaussian sequence model Y = θ + ε with ε ~ N(0, (σ²/n)·I_d), the identity estimator θ̂^LS(Y) = Y has risk: E_θ[‖Y − θ‖₂²] = σ² d / n for all θ ∈ ℝ^d.
Since Y − θ = ε and each εᵢ has variance σ²/n, the expected squared norm is d · (σ²/n) = σ²d/n.
Corollary 5.13 — Minimax rate of estimation over ℝ^d. The minimax rate of estimation over ℝ^d in the Gaussian sequence model is σ²d/n, attained by the identity estimator θ̂^LS = Y.
More precisely: (a) Lower bound: minimaxRisk ≥ C' · σ²d/n for some C' > 0 (from Fano method). (b) Upper bound: minimaxRisk ≤ σ²d/n (tight, achieved by identity). (c) The identity estimator achieves the exact rate: supRisk = σ²d/n.