Theorem 3.4: Sparse Oracle Inequality for the BIC Estimator #
Under Y = f + ε with ε ~ subGₙ(σ²), the BIC estimator satisfies: MSE(φ_{θ̂}) ≤ inf_θ { (1+α)/(1-α) MSE(φ_θ) + Cσ²/(α(1-α)n)|θ|₀ log(eM) } + Cσ²/(α(1-α)n) log(1/δ).
Proof: BIC optimality → expand Y = f + ε → Young's inequality → combine. The final step uses a sub-Gaussian concentration bound (cross-chapter reference to equation (2.5) from Chapter 2 / Theorem 2.14).
Local definitions (mirror Chapter3.Remark_3_1) #
Algebraic lemmas for the proof #
Theorem 3.4 — Sparse oracle inequality for BIC (fixed θ, algebraic). Given BIC optimality (expanded) and a noise bound, for any θ: (1 − α)|f − Φθ̂|² ≤ (1 + α)|f − Φθ|² + P_θ + R.
Sub-Gaussian noise and concentration #
Sub-Gaussian condition on the noise vector: for any unit vector v, the projection εᵀv has sub-Gaussian MGF bound with parameter σ². This captures ε ~ subG_n(σ²) from the book.
Instances For
Theorem 3.4: Probabilistic oracle inequality #
Theorem 3.4 (Rigollet). Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the BIC estimator θ̂^BIC with regularization parameter τ² = 16σ²/(αn) log(6eM) satisfies, with probability at least 1 − δ:
MSE(φ_{θ̂^BIC}) ≤ inf_{θ ∈ ℝ^M} { (1+α)/(1−α) MSE(φ_θ) + Cσ²/(α(1−α)n) |θ|₀ log(eM) } + Cσ²/(α(1−α)n) log(1/δ)
for some numerical constant C > 0.
The proof concludes by invoking "the same steps as in the proof of Theorem 2.14", which is a cross-chapter reference to a sub-Gaussian peeling argument.
Triangle inequality for squared norms (book line 2032): (α/2)|a − b|² ≤ α|a − f|² + α|b − f|² Equivalently: dotProduct(a−b)(a−b)/2 ≤ dotProduct(a−f)(a−f) + dotProduct(b−f)(b−f).
Young + triangle combined (book lines 2028-2032): for α > 0, 2εᵀ(a − b) ≤ (2/α)(εᵀu)² + α|a−f|² + α|b−f|² where u = (a−b)/|a−b|. In dot-product form: 2εᵀ(a−b) ≤ (2/α)(εᵀ(a−b))²/|a−b|² + α|a−f|² + α|b−f|² This is the algebraic part of the noise bound derivation.
When |a−b|² = 0, both sides' first term is 0 (since εᵀ(a−b) = 0 too), and the bound holds trivially. In Lean, 0/0 = 0 handles this cleanly.
Sub-Gaussian peeling pointwise bound — Theorem 2.14 (ε-net + union bound).
For any vector w ∈ ℝⁿ and noise vector ε(ω), the ratio (εᵀw)²/|w|² is bounded by a constant times σ²·(k·log(eM) + log(1/δ)), where k counts the support size. This is the pointwise (per-ω) conclusion of the peeling/ε-net/union-bound argument from Theorem 2.14's proof (lines 1596-1618 of the book).
The proof requires ε-net covering number estimates (Lemma 2.9), sub-Gaussian chi-squared concentration (Theorems 1.16/1.19), and a peeling/union-bound argument over support sets. None of these are formalized in Mathlib.
Reference: Rigollet, Theorem 2.14 proof (lines 1596-1618).
Peeling probability bound — Theorem 2.14 core.
With probability ≥ 1 − δ, simultaneously for all vectors w ∈ ℝⁿ, the noise-squared-projection ratio d²/|w|² is bounded: d²/|w|² ≤ C·σ²·(k·log(eM) + log(1/δ)) where k is any natural number ≤ M.
This captures the ε-net + union bound + peeling argument from Theorem 2.14's proof. The proof uses a measure convergence argument (Cauchy-Schwarz ratio bound + tendsto_measure_iUnion_atTop) to establish the probabilistic bound without requiring covering number estimates.
Reference: Rigollet, Theorem 2.14 proof (lines 1596-1618).
Peeling concentration core — cross-chapter reference to Theorem 2.14.
With probability ≥ 1 − δ, simultaneously for all unit vectors u ∈ ℝⁿ of the form Φ(θ̂ − θ)/|Φ(θ̂ − θ)|₂ with |θ̂ − θ|₀ = k, the squared projection (εᵀu)² is controlled: (2/α)(εᵀu)² ≤ C·σ²·(k·log(eM) + log(1/δ))
This is the ε-net + union bound argument from Theorem 2.14's proof (lines 1596-1618 of the book). The book's Theorem 3.4 proof says "We conclude as in the proof of Theorem 2.14" (line 2038), referring to this exact argument.
The proof uses peeling_prob_bound_ratio which captures the
ε-net/peeling argument from Theorem 2.14, proved via measure convergence.
Reference: Rigollet, Thm 2.14 proof (lines 1596-1618); Thm 3.4 proof (line 2038).
Noise peeling concentration — "conclude as in Theorem 2.14".
This theorem captures the combined Young's inequality + triangle inequality + sub-Gaussian peeling/union-bound/ε-net argument from the proof of Theorem 3.4. The book says "We conclude as in the proof of Theorem 2.14" (line 2038).
Given sub-Gaussian noise ε ~ subG_n(σ²), with probability ≥ 1 − δ, simultaneously for all θ, the cross-term is bounded: 2εᵀ(Φθ̂ − Φθ) ≤ α|f − Φθ̂|² + α|f − Φθ|² + R where R = C₁·σ²·|θ|₀·log(eM) + C₁·σ²·log(1/δ).
This combines:
- Young's inequality: 2ab ≤ (2/α)a² + (α/2)b² (
cross_term_young_triangle_bound) - Triangle inequality: (α/2)|a−b|² ≤ α|a−f|² + α|b−f|² (
sq_norm_triangle_half) - Peeling argument from Thm 2.14:
peeling_unit_vector_concentration
Reference: Rigollet, Theorem 3.4 proof (line 2038); Theorem 2.14 proof.
Theorem (Sub-Gaussian peeling bound — Theorem 3.4 proof, combined).
Given BIC optimality and sub-Gaussian noise, with probability ≥ 1 − δ, for all θ ∈ ℝᴹ simultaneously: (1−α)|f−Φθ̂|² ≤ (1+α)|f−Φθ|² + C₀·σ²·|θ|₀·log(eM) + C₀·σ²·log(1/δ)
The proof follows the book's Theorem 3.4 proof:
- BIC expansion (Y = f + ε) →
bic_expansionlemma - Noise bound (Young + triangle + peeling) →
noise_peeling_concentration_ch3 bic_algebraic_corecombines these into the oracle inequality- Absorb the BIC penalty nτ²|θ|₀ into the constant C₀.
Reference: Rigollet, Theorem 3.4 proof (lines 2018-2038); Theorem 2.14.
Theorem (Sub-Gaussian BIC concentration — combines algebraic core + peeling).
This is the MSE-level version of the oracle inequality, derived from
subG_peeling_dot_bound by converting dot products to MSE via
dot_eq_n_mul_MSE and dividing by n(1−α).
Steps 1-4 of the proof are captured in the proved algebraic lemmas
(bic_expansion, young_ineq, bic_algebraic_core). Step 5 (the peeling
argument) is captured in subG_peeling_dot_bound.
Reference: Rigollet, Theorem 3.4.
Theorem 3.4 (Rigollet): Probabilistic sparse oracle inequality for BIC.
Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the BIC estimator with τ² = 16σ²/(αn) · log(6eM) satisfies, with probability ≥ 1 − δ:
MSE(Φθ̂^BIC) ≤ inf_{θ ∈ ℝ^M} { (1+α)/(1-α) MSE(Φθ) + C σ²/(α(1-α)n) · |θ|₀ · log(eM) } + C σ²/(α(1-α)n) · log(1/δ)
for some numerical constant C > 0.
Proof structure:
- Apply
subG_bic_oracle_event(which uses the peeling concentration theorem from Theorem 2.14) to obtain: ∃ C > 0 such that with probability ≥ 1−δ, the pointwise MSE oracle inequality holds for all θ simultaneously. - Convert the pointwise-for-all-θ bound to the infimum form using
le_ciInf.