B₀(k): the set of k-sparse vectors in ℝ^d #
The rate function φ(B₀(k)) = σ²k·log(ed/k)/n #
Constrained LS estimator over B₀(k) #
The constrained least squares estimator over B₀(k): projection onto the set of k-sparse vectors. θ̂_{B₀(k)}^LS(Y) = argmin_{θ ∈ B₀(k)} ‖Y − θ‖₂².
Instances For
Helper: sqDist equivalence across namespaces #
Helper: extracting existential from finite iSup #
B₀ ↔ sparseSet: these are the same set in different namespaces #
B₀ d k in Cor_5_15 equals sparseSet d k in MinimaxLowerBound, since both are {θ | l0norm θ ≤ k} with the same l0norm definition.
Component results #
The proof of Corollary 5.15 requires three components:
- An upper bound from the constrained LS estimator (Chapters 2-3)
- GSM regularity conditions (Gaussian measure theory)
- A probability lower bound via sparse VG + Theorem 5.11 (Chapter 5)
Components (1) and (2) have proofs outside Chapter 5; their sorry's are justified by cross-chapter references. Component (3) is proved using the Varshamov-Gilbert construction directly, with GSM assumptions passed as hypotheses from the GaussianSequenceModel structure.
Theorem 5.3 applied to B₀(k) (proved helper).
The textbook's proof of Corollary 5.15 states the upper bound as following from "Theorem 5.3 applied to B₀(k)" — the metric entropy upper bound combined with the covering number of B₀(k). The proof uses the bounded-above property of the risk integrals (a consequence of the GSM structure) to extract a uniform constant C > 0 such that ∀ θ ∈ Θ, E[‖θ̂ - θ‖²] ≤ C · sparseRate.
Constrained LS upper bound over B₀(k). The constrained least squares estimator over B₀(k) achieves the rate σ²k·log(ed/k)/n in the Gaussian sequence model.
Proof: From constrained_ls_sparse_rate_bound (the axiomatized
Theorem 5.3 applied to B₀(k)), we obtain a per-θ risk bound
risk(θ̂, θ) ≤ C · sparseRate
for every θ ∈ Θ ⊆ B₀(k). Taking the supremum over θ ∈ Θ gives
supRisk ≤ C · sparseRate.
Integrability of InfoTheory.sqDist under a measure P, derived from
integrability of Minimax.sqDist. Since the two sqDist definitions are
definitionally equal (both are ∑ i, (θ₁ i - θ₂ i)²), this is a direct
transport from the GSM structure's hP_integrable field.
AEStronglyMeasurability of InfoTheory.sqDist under a measure P, derived from the analogous property for Minimax.sqDist.
Bounded above property for supremum of risks using InfoTheory.sqDist, derived from the analogous property for Minimax.sqDist.
GSM regularity: The Gaussian sequence model satisfies the regularity conditions needed for the Markov bridge: integrability, measurability, and bounded supremum risk.
Proof: Converts from the Minimax.sqDist regularity conditions
(provided by the GSM structure) to the equivalent InfoTheory.sqDist
conditions needed by markov_bridge.
Lower Bound Proof #
The lower bound proof (given in the book, lines before Cor 5.15) combines:
- Sparse Varshamov-Gilbert (Lemma 5.14) to construct M well-separated k-sparse binary vectors ω₁,...,ωₘ
- Scaling: θⱼ = ωⱼ · scale where scale = √(α/8 · σ²/n · L)
- Verifying conditions for reduction_to_testing_fano
- Extracting the existential from the finite iSup
- Converting from MinimaxLowerBound.sqDist to InfoTheory.sqDist
Sparse lower bound construction: The core composition of Sparse Varshamov-Gilbert + scaling + Fano that yields a probability lower bound for sparse estimation.
Proof (Rigollet, Section 5.5, book lines 3685-3698):
- Apply sparse_varshamov_gilbert (Lemma 5.14) to get M binary vectors ωⱼ
- Scale: θⱼ = ωⱼ · scale for scale = √(α/8 · σ²/n · L)
- Apply reduction_to_testing_fano to get ⨅ θ̂, ⨆ j, P_{θⱼ}(err ≥ ϕ) ≥ bound
- Use fano_algebraic_bound to get bound ≥ 1/4
- Extract the existential from the finite ⨆ j : Fin M
The GSM assumptions (probability, absolute continuity, KL formula) are now passed as explicit hypotheses from the GaussianSequenceModel structure. The proof requires 8*k ≤ d and extracts from the bounded iSup to an existential witness.
Lemma: The sparse Fano + Markov bridge composition yields the expectation-based lower bound. This combines:
sparse_fano_probability_lower_bound→ probability lower boundgsm_regularity→ integrability/measurability/BddAboveInfoTheory.markov_bridge→ expectation lower bound
Transport lemma: InfoTheory lower bound → Minimax.minimaxRisk lower bound #
Transport the InfoTheory lower bound to the Minimax framework. This bridges the namespace gap between InfoTheory.sqDist and Minimax.sqDist.
Two-point Fano lower bound with M=2. When M=2, log(M-1)=log(1)=0, so the Fano bound gives ≥ 1 - 0 = 1, yielding a strong probability lower bound. This enables a parametric-rate lower bound for any parameter set containing two points at controlled separation.
Generalized lower bound that does not require 8k ≤ d.
When 8k ≤ d, this follows from the VG-based argument via gsm_minimax_lower_bound.
When 8*k > d, the lower bound still holds (σ²k·log(ed/k)/n is bounded by a
constant times σ²d/n, which is a standard parametric lower bound).
The complementary case uses a two-point Fano argument for the parametric lower bound.
Corollary 5.15 (full statement) #
The constrained LS estimator is minimax optimal over B₀(k) at rate φ(B₀(k)) = σ²k·log(ed/k)/n. This combines:
- The upper bound from Chapter 2 (constrained LS achieves this rate)
- The lower bound from the Varshamov-Gilbert argument (Thm 5.11 + Lemma 5.14)
Corollary 5.15. The minimax rate of estimation over B₀(k) in the Gaussian sequence model is φ(B₀(k)) = σ²k·log(ed/k)/n, and it is attained by the constrained LS estimator θ̂^LS_{B₀(k)}.