Theorem 2.15 (Lasso slow rate — algebraic core).
Under the linear model Y = Xθ* + ε, the Lasso estimator θ̂ᴸ satisfies sqL2norm (X *ᵥ (θhat - θstar)) ≤ 4 * ↑n * τ * l1norm θstar provided the infinity-norm condition ∀ j, |∑ᵢ X i j * ε i| ≤ ↑n * τ holds (this holds w.h.p. by sub-Gaussian tail bounds).
Dividing both sides by n gives the MSE form: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.
Theorem 2.15 (Lasso slow rate — MSE form).
Dividing by n: (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁.
Probabilistic part of Theorem 2.15 #
The proof proceeds in two steps:
- (Algebraic core, proved above) If |X⊤ε|∞ ≤ nτ, then MSE ≤ 4τ|θ*|₁
- (Union bound) P(|X⊤ε|∞ > t) ≤ 2d·exp(-t²/(2nσ²)) under sub-Gaussian noise
Setting t = nτ with τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) and using (a+b)² ≥ a²+b² gives P(|X⊤ε|∞ > nτ) ≤ δ.
Union bound for sub-Gaussian columns (Equation (2.17) in the proof).
Under per-column sub-Gaussian tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)) for each j = 1,...,d, the maximum satisfies P(max_j |Xⱼ⊤ε| > t) = P(∃j, |Xⱼ⊤ε| > t) ≤ 2d·exp(-t²/(2nσ²)).
The book's specific τ satisfies the tail bound condition.
For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1 and d ≥ 1: (nτ)²/(2nσ²) ≥ log(2d/δ) This uses the inequality (a+b)² ≥ a² + b² for a,b ≥ 0.
Theorem 2.15 (Lasso slow rate — probabilistic version).
Under the linear model Y = Xθ* + ε where the Lasso is optimal for each realization ω, if the probability of the bad noise event {ω | ∃ j, |Xⱼ⊤ε(ω)| > nτ} is at most δ, then μ{MSE > 4τ|θ*|₁} ≤ δ.
This combines the deterministic core with the measure-theoretic wrapper: on the good event (complement of bad), the deterministic bound applies, and the bad event has probability ≤ δ.
Theorem 2.15 (Lasso slow rate — full probabilistic version with union bound).
Under sub-Gaussian noise with per-column tail bounds P(|Xⱼ⊤ε| > t) ≤ 2·exp(-t²/(2nσ²)), when (nτ)²/(2nσ²) ≥ log(2d/δ), the Lasso satisfies μ{(1/n)|X(θ̂-θ*)|₂² > 4τ|θ*|₁} ≤ δ.
Theorem 2.15 (Lasso slow rate — with the book's specific τ).
For τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) with 0 < δ < 1, under sub-Gaussian noise, the Lasso satisfies (1/n)|X(θ̂-θ*)|₂² ≤ 4τ|θ*|₁ with probability at least 1-δ. This is exactly the statement of Theorem 2.15 in the book.
Layer-cake integration for parametric tail bounds (axiomatized).
Given a nonneg random variable Z on a probability space, if the tail satisfies P(Z > A + B · √(log(1/δ))) ≤ δ for all δ ∈ (0,1), then E[Z] ≤ C₀ · (A + B) for some universal C₀ > 0.
This is the core measure-theoretic step in the proof of Corollary 2.9: the quantile integration E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ combined with the parametric bound Q_Z(1-δ) ≤ A + B·√(log(1/δ)) and the Gamma-function identity ∫₀¹ √(log(1/δ)) dδ = √π/2.
The proof uses the layer-cake formula (Cavalieri's principle) and is not given explicitly in the text; the book references "the same argument as in the proof of Corollary 2.9".
Lasso sup-norm MSE tail bound (axiomatized).
Under the Lasso optimality condition (with sup-norm loss) and sub-Gaussian noise with column normalization, for every δ ∈ (0,1):
P((1/n)‖X(θ̂-θ*)‖_∞² > 4|θ*|₁(σ√(2log(2d)/n) + σ√(2log(1/δ)/n))) ≤ δ
The proof combines:
- The deterministic Lasso slow-rate core (Theorem 2.15 Part 1) adapted to the sup-norm formulation
- The union bound on sub-Gaussian column correlations
- The book's specific τ(δ) = σ√(2log(2d)/n) + σ√(2log(1/δ)/n)
This is axiomatized because the sup-norm formulation of the Lasso requires converting between sup-norm and L2-norm formulations, which involves routine but lengthy norm-comparison arguments not formalized here.
Layer-cake integration for the Lasso (measure-theoretic step).
This is the core measure-theoretic step adapting the proof of Corollary 2.9 to the Lasso setting. Using the layer-cake formula E[Z] = ∫₀^∞ P(Z > u) du, the union bound on |X^Tε|∞, and the algebraic core of Theorem 2.15, we obtain: for any δ ∈ (0,1), the MSE satisfies
E[(1/n)‖X(θ̂-θ*)‖²] ≤ 4|θ*|₁·τ(δ) + δ · bound_on_bad_event_contribution
Integrating over δ yields a bound of order |θ*|₁·σ·√(log(2d)/n).
The integration argument uses E[Z] ≤ ∫₀¹ Q_Z(1-δ) dδ where Q_Z is the quantile function, combined with the parametric bound Q_Z(1-δ) ≤ 4|θ*|₁τ(δ). The integral ∫₀¹ √(log(1/δ)) dδ = √π/2 is a standard Gamma-function identity.
The proof combines lasso_supnorm_mse_tail_bound (axiomatized tail bound)
with layer_cake_parametric_tail_bound (axiomatized layer-cake integration),
as referenced in the book as "the same argument as in the proof of
Corollary 2.9".
Theorem 2.15, Part 2 (expectation bound).
Under the same assumptions as the probabilistic part, E[MSE(Xθ̂^L)] ≤ C·|θ*|₁·σ·√(log(2d)/n) for a universal constant C > 0. The book states: "The bound in expectation follows using the same argument as in the proof of Corollary 2.9."
The proof uses layer_cake_lasso_expected_mse for the measure-theoretic
integration step, then absorbs √2 into the constant.
Theorem 2.15 (Lasso slow rate — bundled).
Under the linear model Y = Xθ* + ε where ε is sub-Gaussian with parameter σ², with column normalization max_j ‖X_j‖₂ ≤ √n, the Lasso estimator θ̂ᴸ with regularization parameter λ = 2τ where τ = σ√(2log(2d)/n) + σ√(2log(1/δ)/n) satisfies:
- (High probability) μ{MSE > 4τ|θ*|₁} ≤ δ
- (Expectation) E[MSE] ≤ C|θ*|₁·σ·√(log(2d)/n) for some universal C > 0.