Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_18

Helper: sqL2norm equals dotProduct #

Helper: Cauchy–Schwarz for finite sums on a subset #

(∑ j ∈ S, |f j|)² ≤ |S| · ∑ j ∈ S, f j²

theorem cauchy_schwarz_subset {d : } (S : Finset (Fin d)) (f : Fin d) :
(∑ jS, |f j|) ^ 2 S.card * jS, f j ^ 2

Equation (2.18): the core algebraic inequality from Lasso optimality. #

From the Lasso optimality condition at θ = θ*, expanding Y = Xθ* + ε: (1/n)|Xv|² ≤ (2/n)⟨ε, Xv⟩ + 2τ(|θ*|₁ − |θ̂|₁) Triangle inequality on supports gives |θ*|₁ − |θ̂|₁ ≤ |v_S|₁ − |v_{Sᶜ}|₁. Hölder + noise condition: (2/n)|⟨ε, Xv⟩| ≤ 2τ|v|₁. Combining: (1/n)|Xv|² ≤ 4τ|v_S|₁.

Now uses IsLassoEstimatorL2 so the sqL2norm expansion sqL2norm(a - b) = sqL2norm a - 2⟨a,b⟩ + sqL2norm b is valid (it fails for the sup norm).

theorem eq_2_18_lasso_basic_inequality {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (eps : Fin n) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + eps) τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ) (S : Finset (Fin d)) (hS : jS, θstar j = 0) :
1 / n * Rigollet.sqL2norm (X.mulVec (θhat - θstar)) 4 * τ * jS, |θhat j - θstar j|

Cone condition from Lasso optimality. #

The cone condition |v_{Sᶜ}|₁ ≤ 3|v_S|₁ is needed for Lemma 2.17. In the book's proof, it follows from the basic inequality combined with the non-negativity of (1/n)|Xv|². With the penalty 2τ and noise bound |(Xᵀε)_j/n| ≤ τ, the derivation requires care about the sign of the cross term.

theorem cone_condition_from_lasso {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (eps : Fin n) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + eps) τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ / 2) (S : Finset (Fin d)) (hS : jS, θstar j = 0) :
jFinset.univ \ S, |(θhat - θstar) j| 3 * jS, |(θhat - θstar) j|

Helper: A² ≤ c·A with A ≥ 0 implies A ≤ c #

theorem le_of_sq_le_mul_self {A c : } (hA : 0 A) (hc : 0 c) (h : A ^ 2 c * A) :
A c
theorem thm_2_18_lasso_fast_rate_mse {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θstar : Fin d) (eps : Fin n) (hY : Y = X.mulVec θstar + eps) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X Y τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ / 2) :
1 / n * Rigollet.sqL2norm (X.mulVec (θhat - θstar)) 576 * k * τ ^ 2

Theorem 2.18 (Lasso fast rate — MSE form).

Under INC(k), |θ*|₀ ≤ k, and linear model Y = Xθ* + ε, the Lasso estimator satisfies (1/n)|X(θ̂ − θ*)|₂² ≤ 576·k·τ² provided |Xᵀε|_∞ ≤ nτ (deterministic condition).

Proof outline (Rigollet, Chapter 2).

  1. Eq (2.18): A ≤ 4τ · |v_S|₁ (Lasso optimality + Hölder + triangle ineq)
  2. Cauchy–Schwarz: |v_S|₁ ≤ √k · |v_S|₂
  3. Lemma 2.17: |v_S|₂² ≤ 2A (INC + cone condition)
  4. Chain: A ≤ 4τ√k · √(2A), square ⟹ A ≤ 32kτ² ≤ 576kτ²
theorem thm_2_18_lasso_fast_rate_l2sq {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θstar : Fin d) (eps : Fin n) (hY : Y = X.mulVec θstar + eps) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X Y τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ / 2) :
(θhat - θstar) ⬝ᵥ (θhat - θstar) 576 * k * τ ^ 2

Theorem 2.18 (Lasso fast rate — ℓ₂² parameter-error bound).

Under INC(k), |θ*|₀ ≤ k, and linear model Y = Xθ* + ε, the Lasso estimator satisfies |θ̂ − θ*|₂² ≤ 576·k·τ².

Note: This is an auxiliary bound on the ℓ₂² error. The book's Theorem 2.18 states the second bound in terms of the ℓ₁ norm; see thm_2_18_lasso_fast_rate_l1 below.

theorem thm_2_18_lasso_fast_rate_l1 {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θstar : Fin d) (eps : Fin n) (hY : Y = X.mulVec θstar + eps) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X Y τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ / 2) :
Rigollet.l1norm (θhat - θstar) 32 * k * τ

Theorem 2.18 (Lasso fast rate — ℓ₁ parameter-error bound).

Under INC(k), |θ*|₀ ≤ k, and linear model Y = Xθ* + ε, the Lasso estimator satisfies |θ̂ − θ*|₁ ≤ 32·k·τ.

Proof (from the book): From (2.18), |v|₁ ≤ 4|v_S|₁. By Cauchy–Schwarz: |v_S|₁ ≤ √k · |v_S|₂. By Lemma 2.17: |v_S|₂ ≤ √(2/n) · |Xv|₂. From the MSE bound: |Xv|₂² ≤ 32nkτ². Combining: |v|₁ ≤ 4√k · √(2/n) · √(32nkτ²) = 32kτ.

Deterministic combined version of Theorem 2.18. #

Given the deterministic noise condition |Xᵀε|∞ ≤ nτ/2, both the MSE and ℓ₁ bounds hold simultaneously.

theorem thm_2_18_lasso_probabilistic {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θstar : Fin d) (eps : Fin n) (hY : Y = X.mulVec θstar + eps) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (τ : ) ( : 0 < τ) (θhat : Fin d) (hLasso : Rigollet.IsLassoEstimatorL2 X Y τ θhat) (hXeps : ∀ (j : Fin d), |i : Fin n, X i j * eps i| n * τ / 2) :
1 / n * Rigollet.sqL2norm (X.mulVec (θhat - θstar)) 576 * k * τ ^ 2 Rigollet.l1norm (θhat - θstar) 32 * k * τ

Theorem 2.18 (Lasso fast rate — deterministic combined).

If the noise condition holds, both bounds follow simultaneously: (1) MSE: (1/n)|X(θ̂-θ*)|₂² ≤ 576·k·τ² (2) ℓ₁: |θ̂-θ*|₁ ≤ 32·k·τ

Probabilistic version of Theorem 2.18. #

The book states Theorem 2.18 with: 2τ = 8σ√(log(2d)/n) + 8σ√(log(1/δ)/n) and "with probability at least 1-δ" the MSE and ℓ₁ bounds hold.

The probabilistic wrapper follows the same pattern as theorem_2_15_probabilistic: on the good event (complement of {∃ j, |Xⱼᵀε| > nτ/2}), the deterministic bound applies, and the bad event has probability ≤ δ by sub-Gaussian concentration + union bound.

theorem thm_2_18_probabilistic_measure {n d : } (hn : 0 < n) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (τ : ) ( : 0 < τ) (θhat : ΩFin d) (hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) τ (θhat ω)) (δ : ENNReal) (hprob : μ {ω : Ω | ∃ (j : Fin d), |i : Fin n, X i j * ε ω i| > n * τ / 2} δ) :
μ {ω : Ω | 1 / n * Rigollet.sqL2norm (X.mulVec (θhat ω - θstar)) > 576 * k * τ ^ 2 Rigollet.l1norm (θhat ω - θstar) > 32 * k * τ} δ

Theorem 2.18 (Lasso fast rate — measure-theoretic probabilistic version).

Under the linear model Y(ω) = Xθ* + ε(ω) where the Lasso is optimal for each realization ω, if the probability of the bad noise event {ω | ∃ j, |Xⱼᵀε(ω)| > nτ/2} is at most δ, then with probability ≥ 1-δ: (1) MSE: (1/n)|X(θ̂-θ*)|₂² ≤ 576·k·τ² (2) ℓ₁: |θ̂-θ*|₁ ≤ 32·k·τ

The proof reduces to the deterministic core on the good event.

theorem thm_2_18_concentration_bridge {n d : } (hn : 2 n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : Matrix (Fin n) (Fin d) ) (ε : ΩFin n) (σ : ) ( : 0 < σ) (δ : ) ( : 0 < δ) (hδ1 : δ < 1) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) :
have τ := 4 * σ * (Real.log (2 * d) / n) + 4 * σ * (Real.log (1 / δ) / n); μ {ω : Ω | ∃ (j : Fin d), |i : Fin n, X i j * ε ω i| > n * τ / 2} ENNReal.ofReal δ

Theorem 2.18 (Lasso fast rate — with the book's specific τ choice).

Fix n ≥ 2. Under sub-Gaussian noise with per-column tail bounds, with the specific choice 2τ = 8σ√(log(2d)/n) + 8σ√(log(1/δ)/n) (equivalently τ = 4σ√(log(2d)/n) + 4σ√(log(1/δ)/n)), the probability of the bad event is at most δ.

Note: The concentration step |Xᵀε|∞ ≤ nτ/2 w.h.p. follows from the union bound + sub-Gaussian tail bounds established in Theorem 2.15. The bridging from the τ choice to (nτ/2)²/(2nσ²) ≥ log(2d/δ) is axiomatized since it involves the same calculation as Prop 2.16.

Expectation bounds. #

The book states: E[MSE(X θ̂^L)] ≲ kσ² log(2d)/n E[|θ̂^L - θ*|₁] ≲ kσ √(log(2d)/n)

The proof says "The bound in expectation follows using the same argument as in the proof of Corollary 2.9." Since the expectation integration argument is not given explicitly, we state these as axioms.

Note: the expectation bounds follow from integrating the high-probability bounds over δ ∈ (0,1). The key idea is: E[Z] = ∫₀^∞ P(Z > t) dt ≤ bound + ∫_{bound}^∞ P(Z > t) dt which yields the stated rates.

theorem tail_integration_lasso_mse {n d : } (hn : 2 n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (σ : ) ( : 0 < σ) (θstar : Fin d) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (ε : ΩFin n) (θhat : ΩFin d) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (4 * σ * (Real.log (2 * d) / n)) (θhat ω)) (hDet : ∀ (ω : Ω), (∀ (j : Fin d), |i : Fin n, X i j * ε ω i| n * (4 * σ * (Real.log (2 * d) / n)) / 2)1 / n * Rigollet.sqL2norm (X.mulVec (θhat ω - θstar)) 576 * k * (4 * σ * (Real.log (2 * d) / n)) ^ 2) :
∃ (C : ), 0 < C (ω : Ω), 1 / n * Rigollet.sqL2norm (X.mulVec (θhat ω - θstar)) μ C * k * σ ^ 2 * Real.log (2 * d) / n

Axiom: Tail integration for the Lasso MSE (Corollary 2.9 argument).

The book says "The bound in expectation follows using the same argument as in the proof of Corollary 2.9" (line 1823 of the textbook). Corollary 2.9's proof uses the layer cake formula E[Z] = ∫₀^∞ P(Z > t) dt together with the high-probability bound. Since the proof of Theorem 2.18 does not give the details (it only references Cor. 2.9), we axiomatize the tail integration sub-step.

Specifically: given • a non-negative random variable Z with the deterministic bound Z(ω) ≤ 576·k·τ² on the event where |Xᵀε|∞ ≤ nτ/2, • a concentration bound P(bad event) ≤ 2d·exp(-(nτ/2)²/(2nσ²)), • τ = 4σ√(log(2d)/n) so that P(bad event) ≤ 1/(2d), the expectation E[Z] is finite and bounded by C·k·σ²·log(2d)/n.

The full argument uses the sub-Gaussian tail structure to bound the contribution from the bad event (not just the good event).

theorem thm_2_18_lasso_expectation_mse_bound {n d : } (hn : 2 n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (σ : ) ( : 0 < σ) (θstar : Fin d) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (ε : ΩFin n) (θhat : ΩFin d) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (4 * σ * (Real.log (2 * d) / n)) (θhat ω)) :
∃ (C : ), 0 < C (ω : Ω), 1 / n * Rigollet.sqL2norm (X.mulVec (θhat ω - θstar)) μ C * k * σ ^ 2 * Real.log (2 * d) / n

Theorem 2.18 (Lasso — MSE expectation bound).

E[MSE(X θ̂^L)] ≤ C·k·σ²·log(2d)/n.

The book defers the proof to "the same argument as in the proof of Corollary 2.9". The tail integration step is axiomatized via tail_integration_lasso_mse; here we compose it with the deterministic MSE bound from thm_2_18_lasso_fast_rate_mse.

theorem lasso_l1_parametric_tail_bound {n d : } (_hn : 2 n) (_hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (k : ) (_hk : 0 < k) (_hINC : AssumptionINC X k) (σ : ) (_hσ : 0 < σ) (θstar : Fin d) (_hstar_sparse : {j : Fin d | θstar j 0}.card k) (ε : ΩFin n) (θhat : ΩFin d) (_hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (_hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (4 * σ * (Real.log (2 * d) / n)) (θhat ω)) (_hDet : ∀ (ω : Ω), (∀ (j : Fin d), |i : Fin n, X i j * ε ω i| n * (4 * σ * (Real.log (2 * d) / n)) / 2)Rigollet.l1norm (θhat ω - θstar) 32 * k * (4 * σ * (Real.log (2 * d) / n))) (δ : ) (_hδ : 0 < δ) (_hδ1 : δ < 1) :
μ {ω : Ω | Rigollet.l1norm (θhat ω - θstar) > 32 * k * (4 * σ * (Real.log (2 * d) / n) + 4 * σ * (Real.log (1 / δ) / n))} ENNReal.ofReal δ

Axiom: Parametric ℓ₁ tail bound for the Lasso.

Under the Lasso optimality condition (with regularization parameter τ₀) and sub-Gaussian noise, for every δ ∈ (0,1):

P(|θ̂-θ*|₁ > 32·k·(4σ√(log(2d)/n) + 4σ√(log(1/δ)/n))) ≤ δ

The proof combines:

  1. The concentration bridge (thm_2_18_concentration_bridge): P(∃j, |Xⱼᵀε| > nτ(δ)/2) ≤ δ where τ(δ) = τ₀ + 4σ√(log(1/δ)/n)
  2. The Lasso feasibility bound: the ℓ₁ error is a.s. bounded by a quantity involving ‖ε‖²/(nτ₀), giving Z ≤ 32kτ(δ) on the good event
  3. The monotonicity of the bad event: P(Z > 32kτ(δ)) ≤ P(bad(τ(δ))) ≤ δ

Note: This requires showing that on the good event for τ(δ) where ∀j, |Xⱼᵀε| ≤ nτ(δ)/2, the Lasso solved with τ₀ satisfies Z ≤ 32kτ(δ). This holds because τ(δ) ≥ τ₀ and the ℓ₁ bound from Thm 2.18 is monotone in the noise threshold (the Lasso optimality condition with τ₀ ≤ τ(δ) implies the basic inequality with the larger threshold). The full argument is not given in the textbook (which references Cor 2.9), so we axiomatize this tail bound.

theorem tail_integration_lasso_l1 {n d : } (hn : 2 n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (σ : ) ( : 0 < σ) (θstar : Fin d) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (ε : ΩFin n) (θhat : ΩFin d) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (4 * σ * (Real.log (2 * d) / n)) (θhat ω)) (hDet : ∀ (ω : Ω), (∀ (j : Fin d), |i : Fin n, X i j * ε ω i| n * (4 * σ * (Real.log (2 * d) / n)) / 2)Rigollet.l1norm (θhat ω - θstar) 32 * k * (4 * σ * (Real.log (2 * d) / n))) :
∃ (C : ), 0 < C (ω : Ω), Rigollet.l1norm (θhat ω - θstar) μ C * k * σ * (Real.log (2 * d) / n)

Tail integration for the Lasso ℓ₁ bound (Corollary 2.9 argument).

The book says "The bound in expectation follows using the same argument as in the proof of Corollary 2.9" (line 1823 of the textbook). Corollary 2.9's proof uses the layer cake formula E[Z] = ∫₀^∞ P(Z > t) dt together with the high-probability bound.

The proof combines lasso_l1_parametric_tail_bound (axiomatized parametric tail bound) with layer_cake_parametric_tail_bound (axiomatized layer-cake integration), following the same pattern as layer_cake_lasso_expected_mse.

theorem thm_2_18_lasso_expectation_l1_bound {n d : } (hn : 2 n) (hd : 0 < d) {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (k : ) (hk : 0 < k) (hINC : AssumptionINC X k) (σ : ) ( : 0 < σ) (θstar : Fin d) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (ε : ΩFin n) (θhat : ΩFin d) (hSubG : ∀ (j : Fin d) (t : ), 0 < tμ {ω : Ω | |i : Fin n, X i j * ε ω i| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))) (hLasso : ∀ (ω : Ω), Rigollet.IsLassoEstimatorL2 X (X.mulVec θstar + ε ω) (4 * σ * (Real.log (2 * d) / n)) (θhat ω)) :
∃ (C : ), 0 < C (ω : Ω), Rigollet.l1norm (θhat ω - θstar) μ C * k * σ * (Real.log (2 * d) / n)

Theorem 2.18 (Lasso — ℓ₁ expectation bound).

E[|θ̂^L - θ*|₁] ≤ C·k·σ·√(log(2d)/n).

The book defers the proof to "the same argument as in the proof of Corollary 2.9". The tail integration step (layer-cake formula E[Z] = ∫₀^∞ P(Z > t) dt combined with the sub-Gaussian tail bound) is axiomatized via tail_integration_lasso_l1; here we compose it with the deterministic ℓ₁ bound from thm_2_18_lasso_probabilistic.