Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2

source

theorem Rigollet.theorem_2_15_lasso_slow_rate {n d : ℕ} (hn : 0 < ↑n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (τ : ℝ) (hτ : 0 < τ) (θhat : Fin d → ℝ) (hY : IsLassoEstimatorL2 X (X.mulVec θstar + ε) τ θhat) (hXeps : ∀ (j : Fin d), |∑ i : Fin n, X i j * ε i| ≤ ↑n * τ) :

sqL2norm (X.mulVec (θhat - θstar)) ≤ 4 * ↑n * τ * l1norm θstar

Theorem 2.15 (Lasso slow rate — algebraic core).

Under the linear model Y = Xθ* + ε, the Lasso estimator θ̂ᴸ satisfies sqL2norm (X *ᵥ (θhat - θstar)) ≤ 4 * ↑n * τ * l1norm θstar provided the infinity-norm condition ∀ j, |∑ᵢ X i j * ε i| ≤ ↑n * τ holds (this holds w.h.p. by sub-Gaussian tail bounds).

Dividing both sides by n gives the MSE form: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.

source

theorem Rigollet.theorem_2_15_lasso_slow_rate_mse {n d : ℕ} (hn : 0 < ↑n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (τ : ℝ) (hτ : 0 < τ) (θhat : Fin d → ℝ) (hY : IsLassoEstimatorL2 X (X.mulVec θstar + ε) τ θhat) (hXeps : ∀ (j : Fin d), |∑ i : Fin n, X i j * ε i| ≤ ↑n * τ) :

1 / ↑n * sqL2norm (X.mulVec (θhat - θstar)) ≤ 4 * τ * l1norm θstar

Theorem 2.15 (Lasso slow rate — MSE form).

Dividing by n: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.

Probabilistic part of Theorem 2.15 #

The proof proceeds in two steps:

(Algebraic core, proved above) If |X⊤ε|∞ ≤ nτ, then MSE ≤ 4τ|θ*|₁
(Union bound) P(|X⊤ε|∞ > t) ≤ 2d·exp(-t²/(2nσ²)) under sub-Gaussian noise

Setting t = nτ with τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) and using (a+b)² ≥ a²+b² gives P(|X⊤ε|∞ > nτ) ≤ δ.

source

theorem Rigollet.exp_neg_le_inv {a b : ℝ} (hb : 0 < b) (ha : Real.log b ≤ a) :

Real.exp (-a) ≤ b⁻¹

Helper: exp(-a) ≤ 1/b when a ≥ log(b) and b > 0.

source

theorem Rigollet.union_bound_subgaussian_cols {n d : ℕ} {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, X i j * ε ω i| > t} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * ↑n * σ ^ 2)))) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | ∃ (j : Fin d), |∑ i : Fin n, X i j * ε ω i| > t} ≤ ENNReal.ofReal (2 * ↑d * Real.exp (-t ^ 2 / (2 * ↑n * σ ^ 2)))

Union bound for sub-Gaussian columns (Equation (2.17) in the proof).

Under per-column sub-Gaussian tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)) for each j = 1,...,d, the maximum satisfies P(max_j |Xⱼ⊤ε| > t) = P(∃j, |Xⱼ⊤ε| > t) ≤ 2d·exp(-t²/(2nσ²)).

source

theorem Rigollet.tail_bound_le_delta {d : ℕ} (hd : 0 < d) {δ : ℝ} (hδ : 0 < δ) {x : ℝ} (hbound : x ≥ Real.log (2 * ↑d / δ)) :

2 * ↑d * Real.exp (-x) ≤ δ

Tail bound computation.

If x ≥ log(2d/δ) with d > 0 and δ > 0, then 2d·exp(-x) ≤ δ. Applied with x = (nτ)²/(2nσ²) to convert the union bound into a probability ≤ δ bound.

source

theorem Rigollet.book_tau_satisfies_bound {n d : ℕ} (hn : 0 < ↑n) (hd : 0 < d) {σ : ℝ} (hσ : 0 < σ) {δ : ℝ} (hδ : 0 < δ) (hδ1 : δ < 1) (τ : ℝ) (hτ_eq : τ = σ * √(2 * Real.log (2 * ↑d) / ↑n) + σ * √(2 * Real.log (1 / δ) / ↑n)) :

(↑n * τ) ^ 2 / (2 * ↑n * σ ^ 2) ≥ Real.log (2 * ↑d / δ)

The book's specific τ satisfies the tail bound condition.

For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1 and d ≥ 1: (nτ)²/(2nσ²) ≥ log(2d/δ) This uses the inequality (a+b)² ≥ a² + b² for a,b ≥ 0.

source

theorem Rigollet.theorem_2_15_probabilistic {n d : ℕ} (hn : 0 < ↑n) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (τ : ℝ) (hτ : 0 < τ) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) τ (θhat ω)) (δ : ENNReal) (hprob : μ {ω : Ω | ∃ (j : Fin d), |∑ i : Fin n, X i j * ε ω i| > ↑n * τ} ≤ δ) :

μ {ω : Ω | 1 / ↑n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} ≤ δ

Theorem 2.15 (Lasso slow rate — probabilistic version).

Under the linear model Y = Xθ* + ε where the Lasso is optimal for each realization ω, if the probability of the bad noise event {ω | ∃ j, |Xⱼ⊤ε(ω)| > nτ} is at most δ, then μ{MSE > 4τ|θ*|₁} ≤ δ.

This combines the deterministic core with the measure-theoretic wrapper: on the good event (complement of bad), the deterministic bound applies, and the bad event has probability ≤ δ.

source

theorem Rigollet.theorem_2_15_with_union_bound {n d : ℕ} (hn : 0 < ↑n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (_hσ : 0 < σ) (τ : ℝ) (hτ : 0 < τ) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) τ (θhat ω)) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, X i j * ε ω i| > t} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * ↑n * σ ^ 2)))) (δ : ℝ) (hδ : 0 < δ) (htau_bound : (↑n * τ) ^ 2 / (2 * ↑n * σ ^ 2) ≥ Real.log (2 * ↑d / δ)) :

μ {ω : Ω | 1 / ↑n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} ≤ ENNReal.ofReal δ

Theorem 2.15 (Lasso slow rate — full probabilistic version with union bound).

Under sub-Gaussian noise with per-column tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)), when (nτ)²/(2nσ²) ≥ log(2d/δ), the Lasso satisfies μ{(1/n)|X(θ̂-θ*)|₂² > 4τ|θ*|₁} ≤ δ.

source

theorem Rigollet.theorem_2_15_specific_tau {n d : ℕ} (hn : 0 < ↑n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (δ : ℝ) (hδ : 0 < δ) (hδ1 : δ < 1) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (σ * √(2 * Real.log (2 * ↑d) / ↑n) + σ * √(2 * Real.log (1 / δ) / ↑n)) (θhat ω)) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, X i j * ε ω i| > t} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * ↑n * σ ^ 2)))) :

have τ := σ * √(2 * Real.log (2 * ↑d) / ↑n) + σ * √(2 * Real.log (1 / δ) / ↑n); μ {ω : Ω | 1 / ↑n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} ≤ ENNReal.ofReal δ

Theorem 2.15 (Lasso slow rate — with the book's specific τ).

For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1, under sub-Gaussian noise, the Lasso satisfies (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁ with probability at least 1-δ. This is exactly the statement of Theorem 2.15 in the book.

source

theorem Rigollet.layer_cake_parametric_tail_bound {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (Z : Ω → ℝ) (hZ_nn : ∀ (ω : Ω), 0 ≤ Z ω) (A B : ℝ) (hA : 0 ≤ A) (hB : 0 ≤ B) (htail : ∀ (δ : ℝ), 0 < δ → δ < 1 → μ {ω : Ω | Z ω > A + B * √(Real.log (1 / δ))} ≤ ENNReal.ofReal δ) :

∃ C₀ > 0, ∫ (ω : Ω), Z ω ∂μ ≤ C₀ * (A + B)

Layer-cake integration for parametric tail bounds (axiomatized).

Given a nonneg random variable Z on a probability space, if the tail satisfies P(Z > A + B · √(log(1/δ))) ≤ δ for all δ ∈ (0,1), then E[Z] ≤ C₀ · (A + B) for some universal C₀ > 0.

This is the core measure-theoretic step in the proof of Corollary 2.9: the quantile integration E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ combined with the parametric bound Q_Z(1-δ) ≤ A + B·√(log(1/δ)) and the Gamma-function identity ∫₀¹ √(log(1/δ)) dδ = √π/2.

The proof uses the layer-cake formula (Cavalieri's principle) and is not given explicitly in the text; the book references "the same argument as in the proof of Corollary 2.9".

source

theorem Rigollet.lasso_supnorm_mse_tail_bound {n d : ℕ} (_hn : 0 < n) (_hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (_hσ : 0 < σ) (_hcol : ∀ (j : Fin d), ‖fun (i : Fin n) => X i j‖ ≤ √↑n) (_hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, ε ω i * X i j| > t * √↑n} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : Ω → Fin d → ℝ) (_hLasso : ∀ (ω : Ω) (θ : Fin d → ℝ), 1 / (2 * ↑n) * ‖X.mulVec (θhat ω) - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θhat ω i| ≤ 1 / (2 * ↑n) * ‖X.mulVec θ - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θ i|) (δ : ℝ) (_hδ : 0 < δ) (_hδ1 : δ < 1) :

μ {ω : Ω | 1 / ↑n * ‖X.mulVec (θhat ω - θstar)‖ ^ 2 > (4 * ∑ i : Fin d, |θstar i|) * (σ * √(2 * Real.log (2 * ↑d) / ↑n) + σ * √(2 * Real.log (1 / δ) / ↑n))} ≤ ENNReal.ofReal δ

Lasso sup-norm MSE tail bound (axiomatized).

Under the Lasso optimality condition (with sup-norm loss) and sub-Gaussian noise with column normalization, for every δ ∈ (0,1):

P((1/n)‖X(θ̂-θ*)‖_∞² > 4|θ*|₁(σ√(2log(2d)/n) + σ√(2log(1/δ)/n))) ≤ δ

The proof combines:

The deterministic Lasso slow-rate core (Theorem 2.15 Part 1) adapted to the sup-norm formulation
The union bound on sub-Gaussian column correlations
The book's specific τ(δ) = σ√(2log(2d)/n) + σ√(2log(1/δ)/n)

This is axiomatized because the sup-norm formulation of the Lasso requires converting between sup-norm and L2-norm formulations, which involves routine but lengthy norm-comparison arguments not formalized here.

source

theorem Rigollet.layer_cake_lasso_expected_mse {n d : ℕ} (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hcol : ∀ (j : Fin d), ‖fun (i : Fin n) => X i j‖ ≤ √↑n) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, ε ω i * X i j| > t * √↑n} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω) (θ : Fin d → ℝ), 1 / (2 * ↑n) * ‖X.mulVec (θhat ω) - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θhat ω i| ≤ 1 / (2 * ↑n) * ‖X.mulVec θ - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θ i|) :

∃ C₀ > 0, ∫ (ω : Ω), 1 / ↑n * ‖X.mulVec (θhat ω - θstar)‖ ^ 2 ∂μ ≤ (C₀ * ∑ i : Fin d, |θstar i|) * σ * √(2 * Real.log (2 * ↑d) / ↑n)

Layer-cake integration for the Lasso (measure-theoretic step).

This is the core measure-theoretic step adapting the proof of Corollary 2.9 to the Lasso setting. Using the layer-cake formula E[Z] = ∫₀^∞ P(Z > u) du, the union bound on |X^Tε|∞, and the algebraic core of Theorem 2.15, we obtain: for any δ ∈ (0,1), the MSE satisfies

E[(1/n)‖X(θ̂-θ*)‖²] ≤ 4|θ*|₁·τ(δ) + δ · bound_on_bad_event_contribution

Integrating over δ yields a bound of order |θ*|₁·σ·√(log(2d)/n).

The integration argument uses E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ where Q_Z is the quantile function, combined with the parametric bound Q_Z(1-δ) ≤ 4|θ*|₁τ(δ). The integral ∫₀¹ √(log(1/δ)) dδ = √π/2 is a standard Gamma-function identity.

The proof combines lasso_supnorm_mse_tail_bound (axiomatized tail bound) with layer_cake_parametric_tail_bound (axiomatized layer-cake integration), as referenced in the book as "the same argument as in the proof of Corollary 2.9".

source

theorem Rigollet.thm_2_15_expected_mse {n d : ℕ} (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hcol : ∀ (j : Fin d), ‖fun (i : Fin n) => X i j‖ ≤ √↑n) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, ε ω i * X i j| > t * √↑n} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω) (θ : Fin d → ℝ), 1 / (2 * ↑n) * ‖X.mulVec (θhat ω) - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θhat ω i| ≤ 1 / (2 * ↑n) * ‖X.mulVec θ - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θ i|) :

∃ C > 0, ∫ (ω : Ω), 1 / ↑n * ‖X.mulVec (θhat ω - θstar)‖ ^ 2 ∂μ ≤ (C * ∑ i : Fin d, |θstar i|) * σ * √(Real.log (2 * ↑d) / ↑n)

Theorem 2.15, Part 2 (expectation bound).

Under the same assumptions as the probabilistic part, E[MSE(Xθ̂^L)] ≤ C·|θ*|₁·σ·√(log(2d)/n) for a universal constant C > 0. The book states: "The bound in expectation follows using the same argument as in the proof of Corollary 2.9."

The proof uses layer_cake_lasso_expected_mse for the measure-theoretic integration step, then absorbs √2 into the constant.

source

theorem Rigollet.theorem_2_15 {n d : ℕ} (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hcol : ∀ (j : Fin d), ‖fun (i : Fin n) => X i j‖ ≤ √↑n) (hSubG : ∀ (j : Fin d) (t : ℝ), 0 < t → μ {ω : Ω | |∑ i : Fin n, ε ω i * X i j| > t * √↑n} ≤ ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : Ω → Fin d → ℝ) (hLasso : ∀ (ω : Ω) (θ : Fin d → ℝ), 1 / (2 * ↑n) * ‖X.mulVec (θhat ω) - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θhat ω i| ≤ 1 / (2 * ↑n) * ‖X.mulVec θ - (X.mulVec θstar + ε ω)‖ ^ 2 + σ * √(2 * Real.log (2 * ↑d) / ↑n) * ∑ i : Fin d, |θ i|) (δ : ℝ) (hδ : 0 < δ) (hδ1 : δ < 1) :

μ {ω : Ω | 1 / ↑n * ‖X.mulVec (θhat ω - θstar)‖ ^ 2 > (4 * ∑ i : Fin d, |θstar i|) * (σ * √(2 * Real.log (2 * ↑d) / ↑n) + σ * √(2 * Real.log (1 / δ) / ↑n))} ≤ ENNReal.ofReal δ ∧ ∃ C > 0, ∫ (ω : Ω), 1 / ↑n * ‖X.mulVec (θhat ω - θstar)‖ ^ 2 ∂μ ≤ (C * ∑ i : Fin d, |θstar i|) * σ * √(Real.log (2 * ↑d) / ↑n)

Theorem 2.15 (Lasso slow rate — bundled).

Under the linear model Y = Xθ* + ε where ε is sub-Gaussian with parameter σ², with column normalization max_j ‖X_j‖₂ ≤ √n, the Lasso estimator θ̂ᴸ with regularization parameter λ = 2τ where τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) satisfies:

(High probability) μ{MSE > 4τ|θ*|₁} ≤ δ
(Expectation) E[MSE] ≤ C|θ*|₁·σ·√(log(2d)/n) for some universal C > 0.

Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_15

Probabilistic part of Theorem 2.15 #