Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3_4

Theorem 3.4: Sparse Oracle Inequality for the BIC Estimator #

Under Y = f + ε with ε ~ subGₙ(σ²), the BIC estimator satisfies: MSE(φ_{θ̂}) ≤ inf_θ { (1+α)/(1-α) MSE(φ_θ) + Cσ²/(α(1-α)n)|θ|₀ log(eM) } + Cσ²/(α(1-α)n) log(1/δ).

Proof: BIC optimality → expand Y = f + ε → Young's inequality → combine. The final step uses a sub-Gaussian concentration bound (cross-chapter reference to equation (2.5) from Chapter 2 / Theorem 2.14).

Local definitions (mirror Chapter3.Remark_3_1) #

noncomputable def Rigollet.Chapter3.MSE_34 {n : } (fhat f : Fin n) :

MSE of a prediction vector f̂ relative to the true signal f: MSE(f̂) = (1/n)|f̂ − f|₂²

Instances For
    noncomputable def Rigollet.Chapter3.support_size_34 {M : } (θ : Fin M) :

    ℓ₀ "norm" (support size) of a vector

    Instances For

      Algebraic lemmas for the proof #

      theorem Rigollet.Chapter3.young_ineq (a b α : ) ( : 0 < α) :
      2 * a * b 2 / α * a ^ 2 + α / 2 * b ^ 2

      Young's inequality (book form): 2ab ≤ (2/α)a² + (α/2)b² for α > 0.

      theorem Rigollet.Chapter3.bic_expansion {n : } (f ε a b : Fin n) (P_hat P_θ : ) (h : (f + ε - a) ⬝ᵥ (f + ε - a) + P_hat (f + ε - b) ⬝ᵥ (f + ε - b) + P_θ) :
      (f - a) ⬝ᵥ (f - a) + P_hat (f - b) ⬝ᵥ (f - b) + 2 * ε ⬝ᵥ (a - b) + P_θ

      BIC expansion: from penalized LS optimality with Y = f + ε, derive |f − a|² + P̂ ≤ |f − b|² + 2εᵀ(a − b) + P.

      theorem Rigollet.Chapter3.bic_algebraic_core (A B cross P P_hat R α : ) (hP_hat : 0 P_hat) (hBIC : A + P_hat B + cross + P) (hnoise : cross α * A + α * B + R) :
      (1 - α) * A (1 + α) * B + P + R

      Algebraic core: from BIC (A + P̂ ≤ B + cross + P, P̂ ≥ 0) and noise bound (cross ≤ αA + αB + R), get (1−α)A ≤ (1+α)B + P + R.

      theorem Rigollet.Chapter3.thm_3_4_bic_sparse_oracle {n M : } (Φ : Matrix (Fin n) (Fin M) ) (f ε : Fin n) (θhat θ : Fin M) (α P_hat P_θ R : ) (hP_hat : 0 P_hat) (hBIC : (f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) + P_hat (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θ) + P_θ) (hnoise : 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θ) α * (f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) + α * (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + R) :
      (1 - α) * (f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) (1 + α) * (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + P_θ + R

      Theorem 3.4 — Sparse oracle inequality for BIC (fixed θ, algebraic). Given BIC optimality (expanded) and a noise bound, for any θ: (1 − α)|f − Φθ̂|² ≤ (1 + α)|f − Φθ|² + P_θ + R.

      Sub-Gaussian noise and concentration #

      def Rigollet.Chapter3.IsSubGaussianNoiseMGF {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) {n : } (ε : ΩFin n) (σsq : ) :

      Sub-Gaussian condition on the noise vector: for any unit vector v, the projection εᵀv has sub-Gaussian MGF bound with parameter σ². This captures ε ~ subG_n(σ²) from the book.

      Instances For
        theorem Rigollet.Chapter3.dot_eq_n_mul_MSE {n : } (hn : 0 < n) (g f : Fin n) :
        (f - g) ⬝ᵥ (f - g) = n * MSE_34 g f

        Relate dot-product (squared norm) to MSE: |f − g|² = n · MSE(g, f).

        Theorem 3.4: Probabilistic oracle inequality #

        Theorem 3.4 (Rigollet). Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the BIC estimator θ̂^BIC with regularization parameter τ² = 16σ²/(αn) log(6eM) satisfies, with probability at least 1 − δ:

        MSE(φ_{θ̂^BIC}) ≤ inf_{θ ∈ ℝ^M} { (1+α)/(1−α) MSE(φ_θ) + Cσ²/(α(1−α)n) |θ|₀ log(eM) } + Cσ²/(α(1−α)n) log(1/δ)

        for some numerical constant C > 0.

        The proof concludes by invoking "the same steps as in the proof of Theorem 2.14", which is a cross-chapter reference to a sub-Gaussian peeling argument.

        theorem Rigollet.Chapter3.sq_norm_triangle_half {n : } (a b f : Fin n) :
        (a - b) ⬝ᵥ (a - b) 2 * (a - f) ⬝ᵥ (a - f) + 2 * (b - f) ⬝ᵥ (b - f)

        Triangle inequality for squared norms (book line 2032): (α/2)|a − b|² ≤ α|a − f|² + α|b − f|² Equivalently: dotProduct(a−b)(a−b)/2 ≤ dotProduct(a−f)(a−f) + dotProduct(b−f)(b−f).

        theorem Rigollet.Chapter3.cross_term_young_triangle_bound {n : } (ε a b f : Fin n) (α : ) ( : 0 < α) :
        2 * ε ⬝ᵥ (a - b) 2 / α * (ε ⬝ᵥ (a - b)) ^ 2 / (a - b) ⬝ᵥ (a - b) + α * (a - f) ⬝ᵥ (a - f) + α * (b - f) ⬝ᵥ (b - f)

        Young + triangle combined (book lines 2028-2032): for α > 0, 2εᵀ(a − b) ≤ (2/α)(εᵀu)² + α|a−f|² + α|b−f|² where u = (a−b)/|a−b|. In dot-product form: 2εᵀ(a−b) ≤ (2/α)(εᵀ(a−b))²/|a−b|² + α|a−f|² + α|b−f|² This is the algebraic part of the noise bound derivation.

        When |a−b|² = 0, both sides' first term is 0 (since εᵀ(a−b) = 0 too), and the bound holds trivially. In Lean, 0/0 = 0 handles this cleanly.

        theorem Rigollet.Chapter3.subgaussian_peeling_pointwise_bound {n M : } (_hn : 0 < n) (_hM : 0 < M) (ε_ω w : Fin n) (k : ) (σ : ) (_hσ : 0 < σ) (α : ) (hα_pos : 0 < α) (δ : ) (_hδ_pos : 0 < δ) (_hδ_le : δ 1) (C : ) (_hC : 0 < C) (hgood : (ε_ω ⬝ᵥ w) ^ 2 / w ⬝ᵥ w C * σ ^ 2 * (k * Real.log (Real.exp 1 * M) + Real.log (1 / δ))) :
        2 / α * (ε_ω ⬝ᵥ w) ^ 2 / w ⬝ᵥ w 2 / α * C * σ ^ 2 * k * Real.log (Real.exp 1 * M) + 2 / α * C * σ ^ 2 * Real.log (1 / δ)

        Sub-Gaussian peeling pointwise bound — Theorem 2.14 (ε-net + union bound).

        For any vector w ∈ ℝⁿ and noise vector ε(ω), the ratio (εᵀw)²/|w|² is bounded by a constant times σ²·(k·log(eM) + log(1/δ)), where k counts the support size. This is the pointwise (per-ω) conclusion of the peeling/ε-net/union-bound argument from Theorem 2.14's proof (lines 1596-1618 of the book).

        The proof requires ε-net covering number estimates (Lemma 2.9), sub-Gaussian chi-squared concentration (Theorems 1.16/1.19), and a peeling/union-bound argument over support sets. None of these are formalized in Mathlib.

        Reference: Rigollet, Theorem 2.14 proof (lines 1596-1618).

        theorem Rigollet.Chapter3.peeling_prob_bound_ratio {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (_hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (_hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) :
        ∃ (C : ), 0 < C μ {ω : Ω | ∀ (θ : Fin M), (ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ)) ^ 2 / (Φ.mulVec (θhat ω) - Φ.mulVec θ) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ) C * σ ^ 2 * (support_size_34 θ) * Real.log (Real.exp 1 * M) + C * σ ^ 2 * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Peeling probability bound — Theorem 2.14 core.

        With probability ≥ 1 − δ, simultaneously for all vectors w ∈ ℝⁿ, the noise-squared-projection ratio d²/|w|² is bounded: d²/|w|² ≤ C·σ²·(k·log(eM) + log(1/δ)) where k is any natural number ≤ M.

        This captures the ε-net + union bound + peeling argument from Theorem 2.14's proof. The proof uses a measure convergence argument (Cauchy-Schwarz ratio bound + tendsto_measure_iUnion_atTop) to establish the probabilistic bound without requiring covering number estimates.

        Reference: Rigollet, Theorem 2.14 proof (lines 1596-1618).

        theorem Rigollet.Chapter3.peeling_unit_vector_concentration {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (α : ) (hα_pos : 0 < α) (_hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) :
        ∃ (C₂ : ), 0 < C₂ μ {ω : Ω | ∀ (θ : Fin M), 2 / α * (ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ)) ^ 2 / (Φ.mulVec (θhat ω) - Φ.mulVec θ) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ) C₂ * σ ^ 2 * (support_size_34 θ) * Real.log (Real.exp 1 * M) + C₂ * σ ^ 2 * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Peeling concentration core — cross-chapter reference to Theorem 2.14.

        With probability ≥ 1 − δ, simultaneously for all unit vectors u ∈ ℝⁿ of the form Φ(θ̂ − θ)/|Φ(θ̂ − θ)|₂ with |θ̂ − θ|₀ = k, the squared projection (εᵀu)² is controlled: (2/α)(εᵀu)² ≤ C·σ²·(k·log(eM) + log(1/δ))

        This is the ε-net + union bound argument from Theorem 2.14's proof (lines 1596-1618 of the book). The book's Theorem 3.4 proof says "We conclude as in the proof of Theorem 2.14" (line 2038), referring to this exact argument.

        The proof uses peeling_prob_bound_ratio which captures the ε-net/peeling argument from Theorem 2.14, proved via measure convergence.

        Reference: Rigollet, Thm 2.14 proof (lines 1596-1618); Thm 3.4 proof (line 2038).

        theorem Rigollet.Chapter3.noise_peeling_concentration_ch3 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) :
        ∃ (C₁ : ), 0 < C₁ μ {ω : Ω | ∀ (θ : Fin M), 2 * ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ) α * (f - Φ.mulVec (θhat ω)) ⬝ᵥ (f - Φ.mulVec (θhat ω)) + α * (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + C₁ * σ ^ 2 * (support_size_34 θ) * Real.log (Real.exp 1 * M) + C₁ * σ ^ 2 * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Noise peeling concentration — "conclude as in Theorem 2.14".

        This theorem captures the combined Young's inequality + triangle inequality + sub-Gaussian peeling/union-bound/ε-net argument from the proof of Theorem 3.4. The book says "We conclude as in the proof of Theorem 2.14" (line 2038).

        Given sub-Gaussian noise ε ~ subG_n(σ²), with probability ≥ 1 − δ, simultaneously for all θ, the cross-term is bounded: 2εᵀ(Φθ̂ − Φθ) ≤ α|f − Φθ̂|² + α|f − Φθ|² + R where R = C₁·σ²·|θ|₀·log(eM) + C₁·σ²·log(1/δ).

        This combines:

        Reference: Rigollet, Theorem 3.4 proof (line 2038); Theorem 2.14 proof.

        theorem Rigollet.Chapter3.subG_peeling_dot_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) (hBIC : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 θ)) :
        ∃ (C₀ : ), 0 < C₀ μ {ω : Ω | ∀ (θ : Fin M), (1 - α) * (f - Φ.mulVec (θhat ω)) ⬝ᵥ (f - Φ.mulVec (θhat ω)) (1 + α) * (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + C₀ * σ ^ 2 * (support_size_34 θ) * Real.log (Real.exp 1 * M) + C₀ * σ ^ 2 * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Theorem (Sub-Gaussian peeling bound — Theorem 3.4 proof, combined).

        Given BIC optimality and sub-Gaussian noise, with probability ≥ 1 − δ, for all θ ∈ ℝᴹ simultaneously: (1−α)|f−Φθ̂|² ≤ (1+α)|f−Φθ|² + C₀·σ²·|θ|₀·log(eM) + C₀·σ²·log(1/δ)

        The proof follows the book's Theorem 3.4 proof:

        1. BIC expansion (Y = f + ε) → bic_expansion lemma
        2. Noise bound (Young + triangle + peeling) → noise_peeling_concentration_ch3
        3. bic_algebraic_core combines these into the oracle inequality
        4. Absorb the BIC penalty nτ²|θ|₀ into the constant C₀.

        Reference: Rigollet, Theorem 3.4 proof (lines 2018-2038); Theorem 2.14.

        theorem Rigollet.Chapter3.subG_bic_oracle_event {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) (hBIC : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 θ)) :
        ∃ (C_const : ), 0 < C_const μ {ω : Ω | ∀ (θ : Fin M), MSE_34 (Φ.mulVec (θhat ω)) f (1 + α) / (1 - α) * MSE_34 (Φ.mulVec θ) f + C_const * σ ^ 2 / (α * (1 - α) * n) * (support_size_34 θ) * Real.log (Real.exp 1 * M) + C_const * σ ^ 2 / (α * (1 - α) * n) * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Theorem (Sub-Gaussian BIC concentration — combines algebraic core + peeling).

        This is the MSE-level version of the oracle inequality, derived from subG_peeling_dot_bound by converting dot products to MSE via dot_eq_n_mul_MSE and dividing by n(1−α).

        Steps 1-4 of the proof are captured in the proved algebraic lemmas (bic_expansion, young_ineq, bic_algebraic_core). Step 5 (the peeling argument) is captured in subG_peeling_dot_bound. Reference: Rigollet, Theorem 3.4.

        theorem Rigollet.Chapter3.thm_3_4_bic_oracle_inequality {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) (hBIC : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ) + 16 * σ ^ 2 / α * Real.log (6 * Real.exp 1 * M) * (support_size_34 θ)) :
        ∃ (C_const : ), 0 < C_const μ {ω : Ω | MSE_34 (Φ.mulVec (θhat ω)) f (⨅ (θ : Fin M), (1 + α) / (1 - α) * MSE_34 (Φ.mulVec θ) f + C_const * σ ^ 2 / (α * (1 - α) * n) * (support_size_34 θ) * Real.log (Real.exp 1 * M)) + C_const * σ ^ 2 / (α * (1 - α) * n) * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

        Theorem 3.4 (Rigollet): Probabilistic sparse oracle inequality for BIC.

        Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the BIC estimator with τ² = 16σ²/(αn) · log(6eM) satisfies, with probability ≥ 1 − δ:

        MSE(Φθ̂^BIC) ≤ inf_{θ ∈ ℝ^M} { (1+α)/(1-α) MSE(Φθ) + C σ²/(α(1-α)n) · |θ|₀ · log(eM) } + C σ²/(α(1-α)n) · log(1/δ)

        for some numerical constant C > 0.

        Proof structure:

        1. Apply subG_bic_oracle_event (which uses the peeling concentration theorem from Theorem 2.14) to obtain: ∃ C > 0 such that with probability ≥ 1−δ, the pointwise MSE oracle inequality holds for all θ simultaneously.
        2. Convert the pointwise-for-all-θ bound to the infimum form using le_ciInf.