Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_15

theorem Rigollet.theorem_2_15_lasso_slow_rate {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : Fin n) (τ : ) ( : 0 < τ) (θhat : Fin d) (hY : IsLassoEstimatorL2 X (X.mulVec θstar + ε) τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * ε i| n * τ) :
sqL2norm (X.mulVec (θhat - θstar)) 4 * n * τ * l1norm θstar

Theorem 2.15 (Lasso slow rate — algebraic core).

Under the linear model Y = Xθ* + ε, the Lasso estimator θ̂ᴸ satisfies sqL2norm (X *ᵥ (θhat - θstar)) ≤ 4 * ↑n * τ * l1norm θstar provided the infinity-norm condition ∀ j, |∑ᵢ X i j * ε i| ≤ ↑n * τ holds (this holds w.h.p. by sub-Gaussian tail bounds).

Dividing both sides by n gives the MSE form: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.

theorem Rigollet.theorem_2_15_lasso_slow_rate_mse {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : Fin n) (τ : ) ( : 0 < τ) (θhat : Fin d) (hY : IsLassoEstimatorL2 X (X.mulVec θstar + ε) τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * ε i| n * τ) :
1 / n * sqL2norm (X.mulVec (θhat - θstar)) 4 * τ * l1norm θstar

Theorem 2.15 (Lasso slow rate — MSE form).

Dividing by n: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.

Probabilistic part of Theorem 2.15 #

The proof proceeds in two steps:

  1. (Algebraic core, proved above) If |X⊤ε|∞ ≤ nτ, then MSE ≤ 4τ|θ*|₁
  2. (Union bound) P(|X⊤ε|∞ > t) ≤ 2d·exp(-t²/(2nσ²)) under sub-Gaussian noise

Setting t = nτ with τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) and using (a+b)² ≥ a²+b² gives P(|X⊤ε|∞ > nτ) ≤ δ.

theorem Rigollet.exp_neg_le_inv {a b : } (hb : 0 < b) (ha : Real.log b a) :

Helper: exp(-a) ≤ 1/b when a ≥ log(b) and b > 0.

theorem Rigollet.union_bound_subgaussian_cols {n d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (ε : ΩFin n) (σ : ) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (t : ) (ht : 0 < t) :
μ {ω : Ω | ∃ (j : Fin d), |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * d * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))

Union bound for sub-Gaussian columns (Equation (2.17) in the proof).

Under per-column sub-Gaussian tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)) for each j = 1,...,d, the maximum satisfies P(max_j |Xⱼ⊤ε| > t) = P(∃j, |Xⱼ⊤ε| > t) ≤ 2d·exp(-t²/(2nσ²)).

theorem Rigollet.tail_bound_le_delta {d : } (hd : 0 < d) {δ : } ( : 0 < δ) {x : } (hbound : x Real.log (2 * d / δ)) :
2 * d * Real.exp (-x) δ

Tail bound computation.

If x ≥ log(2d/δ) with d > 0 and δ > 0, then 2d·exp(-x) ≤ δ. Applied with x = (nτ)²/(2nσ²) to convert the union bound into a probability ≤ δ bound.

theorem Rigollet.book_tau_satisfies_bound {n d : } (hn : 0 < n) (hd : 0 < d) {σ : } ( : 0 < σ) {δ : } ( : 0 < δ) (hδ1 : δ < 1) (τ : ) (hτ_eq : τ = σ * (2 * Real.log (2 * d) / n) + σ * (2 * Real.log (1 / δ) / n)) :
(n * τ) ^ 2 / (2 * n * σ ^ 2) Real.log (2 * d / δ)

The book's specific τ satisfies the tail bound condition.

For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1 and d ≥ 1: (nτ)²/(2nσ²) ≥ log(2d/δ) This uses the inequality (a+b)² ≥ a² + b² for a,b ≥ 0.

theorem Rigollet.theorem_2_15_probabilistic {n d : } (hn : 0 < n) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (τ : ) ( : 0 < τ) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) τ (θhat ω)) (δ : ENNReal) (hprob : μ {ω : Ω | ∃ (j : Fin d), |i : Fin n, X i j * ε ω i| > n * τ} δ) :
μ {ω : Ω | 1 / n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} δ

Theorem 2.15 (Lasso slow rate — probabilistic version).

Under the linear model Y = Xθ* + ε where the Lasso is optimal for each realization ω, if the probability of the bad noise event {ω | ∃ j, |Xⱼ⊤ε(ω)| > nτ} is at most δ, then μ{MSE > 4τ|θ*|₁} ≤ δ.

This combines the deterministic core with the measure-theoretic wrapper: on the good event (complement of bad), the deterministic bound applies, and the bad event has probability ≤ δ.

theorem Rigollet.theorem_2_15_with_union_bound {n d : } (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) (_hσ : 0 < σ) (τ : ) ( : 0 < τ) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) τ (θhat ω)) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (δ : ) ( : 0 < δ) (htau_bound : (n * τ) ^ 2 / (2 * n * σ ^ 2) Real.log (2 * d / δ)) :
μ {ω : Ω | 1 / n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} ENNReal.ofReal δ

Theorem 2.15 (Lasso slow rate — full probabilistic version with union bound).

Under sub-Gaussian noise with per-column tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)), when (nτ)²/(2nσ²) ≥ log(2d/δ), the Lasso satisfies μ{(1/n)|X(θ̂-θ*)|₂² > 4τ|θ*|₁} ≤ δ.

theorem Rigollet.theorem_2_15_specific_tau {n d : } (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) (δ : ) ( : 0 < δ) (hδ1 : δ < 1) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω), IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (σ * (2 * Real.log (2 * d) / n) + σ * (2 * Real.log (1 / δ) / n)) (θhat ω)) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) :
have τ := σ * (2 * Real.log (2 * d) / n) + σ * (2 * Real.log (1 / δ) / n); μ {ω : Ω | 1 / n * sqL2norm (X.mulVec (θhat ω - θstar)) > 4 * τ * l1norm θstar} ENNReal.ofReal δ

Theorem 2.15 (Lasso slow rate — with the book's specific τ).

For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1, under sub-Gaussian noise, the Lasso satisfies (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁ with probability at least 1-δ. This is exactly the statement of Theorem 2.15 in the book.

theorem Rigollet.layer_cake_parametric_tail_bound {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (Z : Ω) (hZ_nn : ∀ (ω : Ω), 0 Z ω) (A B : ) (hA : 0 A) (hB : 0 B) (htail : ∀ (δ : ), 0 < δδ < 1μ {ω : Ω | Z ω > A + B * (Real.log (1 / δ))} ENNReal.ofReal δ) :
C₀ > 0, (ω : Ω), Z ω μ C₀ * (A + B)

Layer-cake integration for parametric tail bounds (axiomatized).

Given a nonneg random variable Z on a probability space, if the tail satisfies P(Z > A + B · √(log(1/δ))) ≤ δ for all δ ∈ (0,1), then E[Z] ≤ C₀ · (A + B) for some universal C₀ > 0.

This is the core measure-theoretic step in the proof of Corollary 2.9: the quantile integration E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ combined with the parametric bound Q_Z(1-δ) ≤ A + B·√(log(1/δ)) and the Gamma-function identity ∫₀¹ √(log(1/δ)) dδ = √π/2.

The proof uses the layer-cake formula (Cavalieri's principle) and is not given explicitly in the text; the book references "the same argument as in the proof of Corollary 2.9".

theorem Rigollet.lasso_supnorm_mse_tail_bound {n d : } (_hn : 0 < n) (_hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) (_hσ : 0 < σ) (_hcol : ∀ (j : Fin d), fun (i : Fin n) => X i j n) (_hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, ε ω i * X i j| > t * n} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : ΩFin d) (_hLasso : ∀ (ω : Ω) (θ : Fin d), 1 / (2 * n) * X.mulVec (θhat ω) - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θhat ω i| 1 / (2 * n) * X.mulVec θ - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θ i|) (δ : ) (_hδ : 0 < δ) (_hδ1 : δ < 1) :
μ {ω : Ω | 1 / n * X.mulVec (θhat ω - θstar) ^ 2 > (4 * i : Fin d, |θstar i|) * (σ * (2 * Real.log (2 * d) / n) + σ * (2 * Real.log (1 / δ) / n))} ENNReal.ofReal δ

Lasso sup-norm MSE tail bound (axiomatized).

Under the Lasso optimality condition (with sup-norm loss) and sub-Gaussian noise with column normalization, for every δ ∈ (0,1):

P((1/n)‖X(θ̂-θ*)‖_∞² > 4|θ*|₁(σ√(2log(2d)/n) + σ√(2log(1/δ)/n))) ≤ δ

The proof combines:

  1. The deterministic Lasso slow-rate core (Theorem 2.15 Part 1) adapted to the sup-norm formulation
  2. The union bound on sub-Gaussian column correlations
  3. The book's specific τ(δ) = σ√(2log(2d)/n) + σ√(2log(1/δ)/n)

This is axiomatized because the sup-norm formulation of the Lasso requires converting between sup-norm and L2-norm formulations, which involves routine but lengthy norm-comparison arguments not formalized here.

theorem Rigollet.layer_cake_lasso_expected_mse {n d : } (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) (hcol : ∀ (j : Fin d), fun (i : Fin n) => X i j n) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, ε ω i * X i j| > t * n} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω) (θ : Fin d), 1 / (2 * n) * X.mulVec (θhat ω) - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θhat ω i| 1 / (2 * n) * X.mulVec θ - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θ i|) :
C₀ > 0, (ω : Ω), 1 / n * X.mulVec (θhat ω - θstar) ^ 2 μ (C₀ * i : Fin d, |θstar i|) * σ * (2 * Real.log (2 * d) / n)

Layer-cake integration for the Lasso (measure-theoretic step).

This is the core measure-theoretic step adapting the proof of Corollary 2.9 to the Lasso setting. Using the layer-cake formula E[Z] = ∫₀^∞ P(Z > u) du, the union bound on |X^Tε|∞, and the algebraic core of Theorem 2.15, we obtain: for any δ ∈ (0,1), the MSE satisfies

E[(1/n)‖X(θ̂-θ*)‖²] ≤ 4|θ*|₁·τ(δ) + δ · bound_on_bad_event_contribution

Integrating over δ yields a bound of order |θ*|₁·σ·√(log(2d)/n).

The integration argument uses E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ where Q_Z is the quantile function, combined with the parametric bound Q_Z(1-δ) ≤ 4|θ*|₁τ(δ). The integral ∫₀¹ √(log(1/δ)) dδ = √π/2 is a standard Gamma-function identity.

The proof combines lasso_supnorm_mse_tail_bound (axiomatized tail bound) with layer_cake_parametric_tail_bound (axiomatized layer-cake integration), as referenced in the book as "the same argument as in the proof of Corollary 2.9".

theorem Rigollet.thm_2_15_expected_mse {n d : } (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) (hcol : ∀ (j : Fin d), fun (i : Fin n) => X i j n) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, ε ω i * X i j| > t * n} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω) (θ : Fin d), 1 / (2 * n) * X.mulVec (θhat ω) - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θhat ω i| 1 / (2 * n) * X.mulVec θ - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θ i|) :
C > 0, (ω : Ω), 1 / n * X.mulVec (θhat ω - θstar) ^ 2 μ (C * i : Fin d, |θstar i|) * σ * (Real.log (2 * d) / n)

Theorem 2.15, Part 2 (expectation bound).

Under the same assumptions as the probabilistic part, E[MSE(Xθ̂^L)] ≤ C·|θ*|₁·σ·√(log(2d)/n) for a universal constant C > 0. The book states: "The bound in expectation follows using the same argument as in the proof of Corollary 2.9."

The proof uses layer_cake_lasso_expected_mse for the measure-theoretic integration step, then absorbs √2 into the constant.

theorem Rigollet.theorem_2_15 {n d : } (hn : 0 < n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) (hcol : ∀ (j : Fin d), fun (i : Fin n) => X i j n) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, ε ω i * X i j| > t * n} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2)))) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω) (θ : Fin d), 1 / (2 * n) * X.mulVec (θhat ω) - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θhat ω i| 1 / (2 * n) * X.mulVec θ - (X.mulVec θstar + ε ω) ^ 2 + σ * (2 * Real.log (2 * d) / n) * i : Fin d, |θ i|) (δ : ) ( : 0 < δ) (hδ1 : δ < 1) :
μ {ω : Ω | 1 / n * X.mulVec (θhat ω - θstar) ^ 2 > (4 * i : Fin d, |θstar i|) * (σ * (2 * Real.log (2 * d) / n) + σ * (2 * Real.log (1 / δ) / n))} ENNReal.ofReal δ C > 0, (ω : Ω), 1 / n * X.mulVec (θhat ω - θstar) ^ 2 μ (C * i : Fin d, |θstar i|) * σ * (Real.log (2 * d) / n)

Theorem 2.15 (Lasso slow rate — bundled).

Under the linear model Y = Xθ* + ε where ε is sub-Gaussian with parameter σ², with column normalization max_j ‖X_j‖₂ ≤ √n, the Lasso estimator θ̂ᴸ with regularization parameter λ = 2τ where τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) satisfies:

  1. (High probability) μ{MSE > 4τ|θ*|₁} ≤ δ
  2. (Expectation) E[MSE] ≤ C|θ*|₁·σ·√(log(2d)/n) for some universal C > 0.