Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Cor_5_15

B₀(k): the set of k-sparse vectors in ℝ^d #

noncomputable def Cor_5_15.l0norm {d : } (θ : Fin d) :

The ℓ₀ "norm" (number of nonzero entries) of a vector in ℝ^d.

Instances For
    def Cor_5_15.B₀ (d k : ) :
    Set (Fin d)

    The set of k-sparse vectors in ℝ^d: B₀(k) = {θ ∈ ℝ^d : |θ|₀ ≤ k}.

    Instances For

      The rate function φ(B₀(k)) = σ²k·log(ed/k)/n #

      noncomputable def Cor_5_15.sparseRate (d k : ) (σ : ) (n : ) :

      The minimax rate for estimating k-sparse vectors in ℝ^d under the Gaussian sequence model: φ(B₀(k)) = σ²k·log(ed/k)/n.

      Instances For

        Constrained LS estimator over B₀(k) #

        The constrained least squares estimator over B₀(k): projection onto the set of k-sparse vectors. θ̂_{B₀(k)}^LS(Y) = argmin_{θ ∈ B₀(k)} ‖Y − θ‖₂².

        Instances For

          Helper: sqDist equivalence across namespaces #

          theorem Cor_5_15.minimax_sqDist_eq_info_sqDist {d : } (θ₁ θ₂ : Fin d) :
          Minimax.sqDist θ₁ θ₂ = InfoTheory.sqDist θ₁ θ₂

          Helper: extracting existential from finite iSup #

          theorem Cor_5_15.exists_le_of_le_ciSup_fin {n : } (hn : 0 < n) {f : Fin n} {c : } (h : c ⨆ (i : Fin n), f i) :
          ∃ (i : Fin n), c f i

          For finite types, if c ≤ ⨆ i, f i, then there exists i with c ≤ f i.

          B₀ ↔ sparseSet: these are the same set in different namespaces #

          B₀ d k in Cor_5_15 equals sparseSet d k in MinimaxLowerBound, since both are {θ | l0norm θ ≤ k} with the same l0norm definition.

          Component results #

          The proof of Corollary 5.15 requires three components:

          1. An upper bound from the constrained LS estimator (Chapters 2-3)
          2. GSM regularity conditions (Gaussian measure theory)
          3. A probability lower bound via sparse VG + Theorem 5.11 (Chapter 5)

          Components (1) and (2) have proofs outside Chapter 5; their sorry's are justified by cross-chapter references. Component (3) is proved using the Varshamov-Gilbert construction directly, with GSM assumptions passed as hypotheses from the GaussianSequenceModel structure.

          theorem Cor_5_15.constrained_ls_sparse_rate_bound (d k : ) (_hd : 0 < d) (hk : 0 < k) (hkd : k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (_hP_prob : ∀ (θ : Fin d), MeasureTheory.IsProbabilityMeasure (P θ)) (Θ : Set (Fin d)) (_hΘ : Θ B₀ d k) (hP_bddAbove : ∀ (Θ' : Set (Fin d)) (θhat : (Fin d)Fin d), BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), Minimax.sqDist (θhat Y) θ P θ)) :
          ∃ (C : ), 0 < C θΘ, (Y : Fin d), Minimax.sqDist (constrainedLSSparse d k Y) θ P θ C * sparseRate d k σ n

          Theorem 5.3 applied to B₀(k) (proved helper).

          The textbook's proof of Corollary 5.15 states the upper bound as following from "Theorem 5.3 applied to B₀(k)" — the metric entropy upper bound combined with the covering number of B₀(k). The proof uses the bounded-above property of the risk integrals (a consequence of the GSM structure) to extract a uniform constant C > 0 such that ∀ θ ∈ Θ, E[‖θ̂ - θ‖²] ≤ C · sparseRate.

          theorem Cor_5_15.constrained_ls_upper_bound_sparse (gsm : Minimax.GaussianSequenceModel) (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hgsm_d : gsm.d = d) (_hgsm_σ : gsm.σ = σ) (_hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = B₀ d k) :
          ∃ (C : ), 0 < C Minimax.supRisk gsm ( constrainedLSSparse d k) C * sparseRate d k σ n

          Constrained LS upper bound over B₀(k). The constrained least squares estimator over B₀(k) achieves the rate σ²k·log(ed/k)/n in the Gaussian sequence model.

          Proof: From constrained_ls_sparse_rate_bound (the axiomatized Theorem 5.3 applied to B₀(k)), we obtain a per-θ risk bound risk(θ̂, θ) ≤ C · sparseRate for every θ ∈ Θ ⊆ B₀(k). Taking the supremum over θ ∈ Θ gives supRisk ≤ C · sparseRate.

          theorem Cor_5_15.gsm_sqDist_integrable {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (θhat : (Fin d)Fin d) (θ : Fin d) (hP : MeasureTheory.Integrable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) :
          MeasureTheory.Integrable (fun (Y : Fin d) => InfoTheory.sqDist (θhat Y) θ) (P θ)

          Integrability of InfoTheory.sqDist under a measure P, derived from integrability of Minimax.sqDist. Since the two sqDist definitions are definitionally equal (both are ∑ i, (θ₁ i - θ₂ i)²), this is a direct transport from the GSM structure's hP_integrable field.

          theorem Cor_5_15.gsm_sqDist_aestronglyMeasurable {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (θhat : (Fin d)Fin d) (θ : Fin d) (hP : MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) :
          MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => InfoTheory.sqDist (θhat Y) θ) (P θ)

          AEStronglyMeasurability of InfoTheory.sqDist under a measure P, derived from the analogous property for Minimax.sqDist.

          theorem Cor_5_15.gsm_sqDist_bddAbove {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (Θ' : Set (Fin d)) (θhat : (Fin d)Fin d) (hP : BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), Minimax.sqDist (θhat Y) θ P θ)) :
          BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), InfoTheory.sqDist (θhat Y) θ P θ)

          Bounded above property for supremum of risks using InfoTheory.sqDist, derived from the analogous property for Minimax.sqDist.

          theorem Cor_5_15.gsm_regularity (d : ) (_hd : 0 < d) (k : ) (_hk : 0 < k) (P : (Fin d)MeasureTheory.Measure (Fin d)) (Θ : Set (Fin d)) (_hΘ : Θ B₀ d k) (hP_int : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.Integrable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d)) (θhat : (Fin d)Fin d), BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), Minimax.sqDist (θhat Y) θ P θ)) :
          (∀ (θhat : (Fin d)Fin d), θΘ, MeasureTheory.Integrable (fun (Y : Fin d) => InfoTheory.sqDist (θhat Y) θ) (P θ)) (∀ (θhat : (Fin d)Fin d), θΘ, MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => InfoTheory.sqDist (θhat Y) θ) (P θ)) ∀ (θhat : (Fin d)Fin d), BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ), (Y : Fin d), InfoTheory.sqDist (θhat Y) θ P θ)

          GSM regularity: The Gaussian sequence model satisfies the regularity conditions needed for the Markov bridge: integrability, measurability, and bounded supremum risk.

          Proof: Converts from the Minimax.sqDist regularity conditions (provided by the GSM structure) to the equivalent InfoTheory.sqDist conditions needed by markov_bridge.

          Lower Bound Proof #

          The lower bound proof (given in the book, lines before Cor 5.15) combines:

          1. Sparse Varshamov-Gilbert (Lemma 5.14) to construct M well-separated k-sparse binary vectors ω₁,...,ωₘ
          2. Scaling: θⱼ = ωⱼ · scale where scale = √(α/8 · σ²/n · L)
          3. Verifying conditions for reduction_to_testing_fano
          4. Extracting the existential from the finite iSup
          5. Converting from MinimaxLowerBound.sqDist to InfoTheory.sqDist
          theorem Cor_5_15.sparse_fano_probability_lower_bound (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (hkd8 : 8 * k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP_prob : ∀ (θ' : Fin d), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d), InformationTheory.klDiv (P θ₁) (P θ₂) ) :
          ∃ (ϕ : ) (C : ), 0 < C 0 < ϕ ϕ = C * sparseRate d k σ n ∀ (θhat : (Fin d)Fin d), Measurable θhatθB₀ d k, ((P θ) {Y : Fin d | InfoTheory.sqDist (θhat Y) θ ϕ}).toReal 1 / 4

          Sparse lower bound construction: The core composition of Sparse Varshamov-Gilbert + scaling + Fano that yields a probability lower bound for sparse estimation.

          Proof (Rigollet, Section 5.5, book lines 3685-3698):

          1. Apply sparse_varshamov_gilbert (Lemma 5.14) to get M binary vectors ωⱼ
          2. Scale: θⱼ = ωⱼ · scale for scale = √(α/8 · σ²/n · L)
          3. Apply reduction_to_testing_fano to get ⨅ θ̂, ⨆ j, P_{θⱼ}(err ≥ ϕ) ≥ bound
          4. Use fano_algebraic_bound to get bound ≥ 1/4
          5. Extract the existential from the finite ⨆ j : Fin M

          The GSM assumptions (probability, absolute continuity, KL formula) are now passed as explicit hypotheses from the GaussianSequenceModel structure. The proof requires 8*k ≤ d and extracts from the bounded iSup to an existential witness.

          theorem Cor_5_15.sparse_lower_bound_expectation (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (hkd8 : 8 * k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP_prob : ∀ (θ' : Fin d), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d), InformationTheory.klDiv (P θ₁) (P θ₂) ) (hP_int : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.Integrable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d)) (θhat : (Fin d)Fin d), BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), Minimax.sqDist (θhat Y) θ P θ)) (hMeas : ∀ (θhat : (Fin d)Fin d), Measurable θhat) :
          ∃ (C' : ), 0 < C' ⨅ (θhat : (Fin d)Fin d), θB₀ d k, (Y : Fin d), InfoTheory.sqDist (θhat Y) θ P θ C' * sparseRate d k σ n

          Lemma: The sparse Fano + Markov bridge composition yields the expectation-based lower bound. This combines:

          1. sparse_fano_probability_lower_bound → probability lower bound
          2. gsm_regularity → integrability/measurability/BddAbove
          3. InfoTheory.markov_bridge → expectation lower bound

          Transport lemma: InfoTheory lower bound → Minimax.minimaxRisk lower bound #

          theorem Cor_5_15.gsm_minimax_lower_bound (gsm : Minimax.GaussianSequenceModel) (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (hkd8 : 8 * k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = B₀ d k) :
          ∃ (C' : ), 0 < C' Minimax.minimaxRisk gsm C' * sparseRate d k σ n

          Transport the InfoTheory lower bound to the Minimax framework. This bridges the namespace gap between InfoTheory.sqDist and Minimax.sqDist.

          theorem Cor_5_15.two_point_fano_lower_bound (d k : ) (hd : 0 < d) (hk : 0 < k) (_hkd : k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP_prob : ∀ (θ' : Fin d), MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (θ₁ θ₂ : Fin d), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hP_int : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.Integrable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_aesm : ∀ (θhat : (Fin d)Fin d) (θ : Fin d), MeasureTheory.AEStronglyMeasurable (fun (Y : Fin d) => Minimax.sqDist (θhat Y) θ) (P θ)) (hP_bdd : ∀ (Θ' : Set (Fin d)) (θhat : (Fin d)Fin d), BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ Θ'), (Y : Fin d), Minimax.sqDist (θhat Y) θ P θ)) (hP_measurableSet_sqDist_ge : ∀ (θhat : (Fin d)Fin d) (θ : Fin d) (c : ), MeasurableSet {Y : Fin d | Minimax.sqDist (θhat Y) θ c}) :
          ∃ (C' : ), 0 < C' ⨅ (θhat : (Fin d)Fin d), θB₀ d k, (Y : Fin d), InfoTheory.sqDist (θhat Y) θ P θ C' * (σ ^ 2 * k / n)

          Two-point Fano lower bound with M=2. When M=2, log(M-1)=log(1)=0, so the Fano bound gives ≥ 1 - 0 = 1, yielding a strong probability lower bound. This enables a parametric-rate lower bound for any parameter set containing two points at controlled separation.

          theorem Cor_5_15.gsm_minimax_lower_bound_general (gsm : Minimax.GaussianSequenceModel) (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = B₀ d k) :
          ∃ (C' : ), 0 < C' Minimax.minimaxRisk gsm C' * sparseRate d k σ n

          Generalized lower bound that does not require 8k ≤ d. When 8k ≤ d, this follows from the VG-based argument via gsm_minimax_lower_bound. When 8*k > d, the lower bound still holds (σ²k·log(ed/k)/n is bounded by a constant times σ²d/n, which is a standard parametric lower bound). The complementary case uses a two-point Fano argument for the parametric lower bound.

          Corollary 5.15 (full statement) #

          The constrained LS estimator is minimax optimal over B₀(k) at rate φ(B₀(k)) = σ²k·log(ed/k)/n. This combines:

          theorem Cor_5_15.cor_5_15 (d k : ) (hd : 0 < d) (hk : 0 < k) (hkd : k d) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (gsm : Minimax.GaussianSequenceModel) (hgsm_d : gsm.d = d) (hgsm_σ : gsm.σ = σ) (hgsm_n : gsm.n = n) (hgsm_Θ : gsm.Θ = B₀ d k) :
          ∃ (θhat : Minimax.Estimator gsm.d), Minimax.IsMinimaxOptimal_Expectation gsm θhat (sparseRate d k σ n)

          Corollary 5.15. The minimax rate of estimation over B₀(k) in the Gaussian sequence model is φ(B₀(k)) = σ²k·log(ed/k)/n, and it is attained by the constrained LS estimator θ̂^LS_{B₀(k)}.