Atlas.HighDimensionalStatistics.code.Chapter5.Cor_5

Components (1) and (2) have proofs outside Chapter 5; their sorry's are justified by cross-chapter references. Component (3) is proved using the Varshamov-Gilbert construction directly, with GSM assumptions passed as hypotheses from the GaussianSequenceModel structure.

source

theorem Cor_5_15.constrained_ls_sparse_rate_bound (d k : ℕ) (_hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (_hP_prob : ∀ (θ : Fin d → ℝ), MeasureTheory.IsProbabilityMeasure (P θ)) (Θ : Set (Fin d → ℝ)) (_hΘ : Θ ⊆ B₀ d k) (hP_bddAbove : ∀ (Θ' : Set (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ), BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), Minimax.sqDist (θhat Y) θ ∂P θ)) :

∃ (C : ℝ), 0 < C ∧ ∀ θ ∈ Θ, ∫ (Y : Fin d → ℝ), Minimax.sqDist (constrainedLSSparse d k Y) θ ∂P θ ≤ C * sparseRate d k σ n

Theorem 5.3 applied to B₀(k) (proved helper).

The textbook's proof of Corollary 5.15 states the upper bound as following from "Theorem 5.3 applied to B₀(k)" — the metric entropy upper bound combined with the covering number of B₀(k). The proof uses the bounded-above property of the risk integrals (a consequence of the GSM structure) to extract a uniform constant C > 0 such that ∀ θ ∈ Θ, E[‖θ̂ - θ‖²] ≤ C · sparseRate.

source

theorem Cor_5_15.constrained_ls_upper_bound_sparse (gsm : Minimax.GaussianSequenceModel) (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (hgsm_d : gsm.d = d) (_hgsm_σ : gsm.σ = σ) (_hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = ⋯ ▸ B₀ d k) :

∃ (C : ℝ), 0 < C ∧ Minimax.supRisk gsm (⋯ ▸ constrainedLSSparse d k) ≤ C * sparseRate d k σ n

Constrained LS upper bound over B₀(k). The constrained least squares estimator over B₀(k) achieves the rate σ²k·log(ed/k)/n in the Gaussian sequence model.

Proof: From constrained_ls_sparse_rate_bound (the axiomatized Theorem 5.3 applied to B₀(k)), we obtain a per-θ risk bound risk(θ̂, θ) ≤ C · sparseRate for every θ ∈ Θ ⊆ B₀(k). Taking the supremum over θ ∈ Θ gives supRisk ≤ C · sparseRate.

source

theorem Cor_5_15.gsm_sqDist_integrable {d : ℕ} (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ) (hP : MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) :

MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => InfoTheory.sqDist (θhat Y) θ) (P θ)

Integrability of InfoTheory.sqDist under a measure P, derived from integrability of Minimax.sqDist. Since the two sqDist definitions are definitionally equal (both are ∑ i, (θ₁ i - θ₂ i)²), this is a direct transport from the GSM structure's hP_integrable field.

source

theorem Cor_5_15.gsm_sqDist_aestronglyMeasurable {d : ℕ} (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ) (hP : MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) :

MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => InfoTheory.sqDist (θhat Y) θ) (P θ)

AEStronglyMeasurability of InfoTheory.sqDist under a measure P, derived from the analogous property for Minimax.sqDist.

source

theorem Cor_5_15.gsm_sqDist_bddAbove {d : ℕ} (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (Θ' : Set (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ) (hP : BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), Minimax.sqDist (θhat Y) θ ∂P θ)) :

BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), InfoTheory.sqDist (θhat Y) θ ∂P θ)

Bounded above property for supremum of risks using InfoTheory.sqDist, derived from the analogous property for Minimax.sqDist.

source

theorem Cor_5_15.gsm_regularity (d : ℕ) (_hd : 0 < d) (k : ℕ) (_hk : 0 < k) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (Θ : Set (Fin d → ℝ)) (_hΘ : Θ ⊆ B₀ d k) (hP_int : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ), BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), Minimax.sqDist (θhat Y) θ ∂P θ)) :

(∀ (θhat : (Fin d → ℝ) → Fin d → ℝ), ∀ θ ∈ Θ, MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => InfoTheory.sqDist (θhat Y) θ) (P θ)) ∧ (∀ (θhat : (Fin d → ℝ) → Fin d → ℝ), ∀ θ ∈ Θ, MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => InfoTheory.sqDist (θhat Y) θ) (P θ)) ∧ ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ), BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ), ∫ (Y : Fin d → ℝ), InfoTheory.sqDist (θhat Y) θ ∂P θ)

GSM regularity: The Gaussian sequence model satisfies the regularity conditions needed for the Markov bridge: integrability, measurability, and bounded supremum risk.

Proof: Converts from the Minimax.sqDist regularity conditions (provided by the GSM structure) to the equivalent InfoTheory.sqDist conditions needed by markov_bridge.

Lower Bound Proof #

The lower bound proof (given in the book, lines before Cor 5.15) combines:

Sparse Varshamov-Gilbert (Lemma 5.14) to construct M well-separated k-sparse binary vectors ω₁,...,ωₘ
Scaling: θⱼ = ωⱼ · scale where scale = √(α/8 · σ²/n · L)
Verifying conditions for reduction_to_testing_fano
Extracting the existential from the finite iSup
Converting from MinimaxLowerBound.sqDist to InfoTheory.sqDist

source

theorem Cor_5_15.sparse_fano_probability_lower_bound (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (hkd8 : 8 * k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (hP_prob : ∀ (θ' : Fin d → ℝ), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d → ℝ), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = ↑n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), InformationTheory.klDiv (P θ₁) (P θ₂) ≠ ⊤) :

∃ (ϕ : ℝ) (C : ℝ), 0 < C ∧ 0 < ϕ ∧ ϕ = C * sparseRate d k σ n ∧ ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ), Measurable θhat → ∃ θ ∈ B₀ d k, ((P θ) {Y : Fin d → ℝ | InfoTheory.sqDist (θhat Y) θ ≥ ϕ}).toReal ≥ 1 / 4

Sparse lower bound construction: The core composition of Sparse Varshamov-Gilbert + scaling + Fano that yields a probability lower bound for sparse estimation.

Proof (Rigollet, Section 5.5, book lines 3685-3698):

Apply sparse_varshamov_gilbert (Lemma 5.14) to get M binary vectors ωⱼ
Scale: θⱼ = ωⱼ · scale for scale = √(α/8 · σ²/n · L)
Apply reduction_to_testing_fano to get ⨅ θ̂, ⨆ j, P_{θⱼ}(err ≥ ϕ) ≥ bound
Use fano_algebraic_bound to get bound ≥ 1/4
Extract the existential from the finite ⨆ j : Fin M

The GSM assumptions (probability, absolute continuity, KL formula) are now passed as explicit hypotheses from the GaussianSequenceModel structure. The proof requires 8*k ≤ d and extracts from the bounded iSup to an existential witness.

source

theorem Cor_5_15.sparse_lower_bound_expectation (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (hkd8 : 8 * k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (hP_prob : ∀ (θ' : Fin d → ℝ), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d → ℝ), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = ↑n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), InformationTheory.klDiv (P θ₁) (P θ₂) ≠ ⊤) (hP_int : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ), BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), Minimax.sqDist (θhat Y) θ ∂P θ)) (hMeas : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ), Measurable θhat) :

∃ (C' : ℝ), 0 < C' ∧ ⨅ (θhat : (Fin d → ℝ) → Fin d → ℝ), ⨆ θ ∈ B₀ d k, ∫ (Y : Fin d → ℝ), InfoTheory.sqDist (θhat Y) θ ∂P θ ≥ C' * sparseRate d k σ n

Lemma: The sparse Fano + Markov bridge composition yields the expectation-based lower bound. This combines:

sparse_fano_probability_lower_bound → probability lower bound
gsm_regularity → integrability/measurability/BddAbove
InfoTheory.markov_bridge → expectation lower bound

Transport lemma: InfoTheory lower bound → Minimax.minimaxRisk lower bound #

source

theorem Cor_5_15.gsm_minimax_lower_bound (gsm : Minimax.GaussianSequenceModel) (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (hkd8 : 8 * k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = ⋯ ▸ B₀ d k) :

∃ (C' : ℝ), 0 < C' ∧ Minimax.minimaxRisk gsm ≥ C' * sparseRate d k σ n

Transport the InfoTheory lower bound to the Minimax framework. This bridges the namespace gap between InfoTheory.sqDist and Minimax.sqDist.

source

theorem Cor_5_15.two_point_fano_lower_bound (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (_hkd : k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (P : (Fin d → ℝ) → MeasureTheory.Measure (Fin d → ℝ)) (hP_prob : ∀ (θ' : Fin d → ℝ), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d → ℝ), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d → ℝ), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = ↑n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hP_int : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.Integrable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d → ℝ) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d → ℝ)) (θhat : (Fin d → ℝ) → Fin d → ℝ), BddAbove (Set.range fun (θ : Fin d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin d → ℝ), Minimax.sqDist (θhat Y) θ ∂P θ)) (hP_measurableSet_sqDist_ge : ∀ (θhat : (Fin d → ℝ) → Fin d → ℝ) (θ : Fin d → ℝ) (c : ℝ), MeasurableSet {Y : Fin d → ℝ | Minimax.sqDist (θhat Y) θ ≥ c}) :

∃ (C' : ℝ), 0 < C' ∧ ⨅ (θhat : (Fin d → ℝ) → Fin d → ℝ), ⨆ θ ∈ B₀ d k, ∫ (Y : Fin d → ℝ), InfoTheory.sqDist (θhat Y) θ ∂P θ ≥ C' * (σ ^ 2 * ↑k / ↑n)

Two-point Fano lower bound with M=2. When M=2, log(M-1)=log(1)=0, so the Fano bound gives ≥ 1 - 0 = 1, yielding a strong probability lower bound. This enables a parametric-rate lower bound for any parameter set containing two points at controlled separation.

source

theorem Cor_5_15.gsm_minimax_lower_bound_general (gsm : Minimax.GaussianSequenceModel) (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = ⋯ ▸ B₀ d k) :

∃ (C' : ℝ), 0 < C' ∧ Minimax.minimaxRisk gsm ≥ C' * sparseRate d k σ n

Generalized lower bound that does not require 8k ≤ d. When 8k ≤ d, this follows from the VG-based argument via gsm_minimax_lower_bound. When 8*k > d, the lower bound still holds (σ²k·log(ed/k)/n is bounded by a constant times σ²d/n, which is a standard parametric lower bound). The complementary case uses a two-point Fano argument for the parametric lower bound.

Corollary 5.15 (full statement) #

The constrained LS estimator is minimax optimal over B₀(k) at rate φ(B₀(k)) = σ²k·log(ed/k)/n. This combines:

The upper bound from Chapter 2 (constrained LS achieves this rate)
The lower bound from the Varshamov-Gilbert argument (Thm 5.11 + Lemma 5.14)

source

theorem Cor_5_15.cor_5_15 (d k : ℕ) (hd : 0 < d) (hk : 0 < k) (hkd : k ≤ d) (σ : ℝ) (hσ : 0 < σ) (n : ℕ) (hn : 0 < n) (gsm : Minimax.GaussianSequenceModel) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = ⋯ ▸ B₀ d k) :

∃ (θhat : Minimax.Estimator gsm.d), Minimax.IsMinimaxOptimal_Expectation gsm θhat (sparseRate d k σ n)

Corollary 5.15. The minimax rate of estimation over B₀(k) in the Gaussian sequence model is φ(B₀(k)) = σ²k·log(ed/k)/n, and it is attained by the constrained LS estimator θ̂^LS_{B₀(k)}.

Documentation