Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_4

Theorem 2.4: ℓ₁-Constrained Least Squares MSE Bound #

Main results #

Proof strategy #

The proof of the constrained fundamental inequality is purely algebraic:

  1. Minimality over K at θ = θ* gives |Y - Xθ̂|² ≤ |ε|²
  2. Expanding |Y - Xθ̂|² = |ε - X(θ̂-θ*)|² = |ε|² - 2εᵀX(θ̂-θ*) + |X(θ̂-θ*)|²
  3. Combining: |X(θ̂-θ*)|² ≤ 2εᵀX(θ̂-θ*)

For the ℓ₁ ball, since B₁ is a symmetric convex polytope with 2d vertices ±e_j:

References #

theorem IsSubGaussian.mono_variance {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {f : Ω} {σsq₁ σsq₂ : } (h : IsSubGaussian f σsq₁ μ) (hle : σsq₁ σsq₂) :
IsSubGaussian f σsq₂ μ

Helper: monotonicity of IsSubGaussian in the variance proxy.

Constrained Fundamental Inequality #

theorem Rigollet.Chapter2.constrained_fundamental_inequality {n d : } (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : Fin n) (θhat : Fin d) (K : Set (Fin d)) (hθstar_in_K : θstar K) (hLS : θK, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) :
X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) 2 * ε ⬝ᵥ X.mulVec (θhat - θstar)

Constrained Fundamental Inequality (algebraic core).

Under the linear model Y = Xθ* + ε, if θ̂ minimizes the squared residual |Y - Xθ|² = (Y - Xθ) ⬝ᵥ (Y - Xθ) over θ ∈ K, and θ* ∈ K, then

|X(θ̂ - θ*)|² ≤ 2 · εᵀ X(θ̂ - θ*)

This generalizes thm_2_2_fundamental_inequality to constrained estimation. The proof is identical: minimality at θ* + expansion + simplification.

theorem Rigollet.Chapter2.constrained_sup_bound {n d : } (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : Fin n) (θhat : Fin d) (K : Set (Fin d)) (hθstar_in_K : θstar K) (hθhat_in_K : θhat K) (hLS : θK, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) (hbdd : BddAbove ((fun (θ : Fin d) => ε ⬝ᵥ X.mulVec θ) '' (K - K))) :
X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) 2 * sSup ((fun (θ : Fin d) => ε ⬝ᵥ X.mulVec θ) '' (K - K))

Constrained Fundamental Inequality with sup over K-K.

If θ̂ minimizes |Y - Xθ|² over K with θ̂, θ* ∈ K, then |X(θ̂ - θ*)|² ≤ 2 · sup_{θ ∈ K-K} εᵀXθ.

Since θ̂ - θ* ∈ K - K, the inner product εᵀX(θ̂-θ*) is bounded by the supremum over K - K.

Tail bound for ℓ₁-constrained LS via Theorem 1.16 #

For K = B₁ (unit ℓ₁ ball), XK is a polytope with at most 2d vertices ±X_j. If ε subG_n(σ²) and |X_j|₂ ≤ √n, then εᵀX_j subG(nσ²), so by Theorem 1.16:

P(sup_{v ∈ XK} εᵀv > t) ≤ 2d · exp(-t²/(2nσ²))

Setting t = nt'/4, this gives P(MSE > t') ≤ 2d · exp(-nt'²/(32σ²)).

theorem Rigollet.Chapter2.thm_2_4_tail_bound {n : } {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (g : Ω(Fin n) →ₗ[] ) (S : Finset (Fin n)) {σsq : } (hsg : vS, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (t : ) (ht : 0 < t) :
μ {ω : Ω | θ(convexHull ) S, (g ω) θ > t} S.card ENNReal.ofReal (Real.exp (-(t ^ 2 / (2 * σsq))))

Theorem 2.4 — Tail bound for ℓ₁-constrained LS (Theorem 1.16 application).

Let S be the finset of signed columns {X_1, -X_1, ..., X_d, -X_d} ⊂ ℝⁿ, which are the (potential) vertices of XB₁. If εᵀv is subG(σ²) for each vertex v ∈ S, then for any t > 0:

P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ |S| · exp(-t²/(2σ²))

This is a direct application of Theorem 1.16. When |S| = 2d and σ² is replaced by nσ² (from column normalization), it gives the probabilistic component of Theorem 2.4.

theorem Rigollet.Chapter2.thm_2_4_tail_bound' {n : } {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (g : Ω(Fin n) →ₗ[] ) (S : Finset (Fin n)) {σsq : } (hsg : vS, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (t : ) (ht : 0 < t) :
μ {ω : Ω | θ(convexHull ) S, (g ω) θ > t} ENNReal.ofReal (S.card * Real.exp (-(t ^ 2 / (2 * σsq))))

Theorem 2.4 — Tail bound (alternative form with ENNReal.ofReal).

P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ ENNReal.ofReal(|S| · exp(-t²/(2σ²)))

Full Theorem 2.4 statement #

The full theorem combines:

  1. Constrained fundamental inequality: |X(θ̂-θ*)|² ≤ 4 sup_{v ∈ XK} εᵀv
  2. Tail bound: P(sup > nt/4) ≤ 2d exp(-nt²/(32σ²))

We state the combined result: the MSE tail bound P[MSE > t] ≤ 2d exp(-nt²/(32σ²)).

theorem Rigollet.Chapter2.thm_2_4_deterministic_bound {n d : } (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : Fin n) (θhat : Fin d) (K : Set (Fin d)) (hθstar_in_K : θstar K) (hθhat_in_K : θhat K) (hK_symm : ∀ (θ : Fin d), θ K -θ K) (hK_convex : Convex K) (hLS : θK, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) (hbdd : BddAbove ((fun (θ : Fin d) => ε ⬝ᵥ X.mulVec θ) '' K)) :
X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) 4 * sSup ((fun (θ : Fin d) => ε ⬝ᵥ X.mulVec θ) '' K)

Theorem 2.4 — MSE tail bound for ℓ₁-constrained LS.

Under the linear model Y = Xθ* + ε with:

  • θ̂ minimizing |Y-Xθ|² over B₁ (and θ* ∈ B₁)
  • S = {±X_1, ..., ±X_d} the signed columns of X (vertices of XB₁)
  • εᵀv subG(nσ²) for each vertex v ∈ S (from column normalization |X_j|₂ ≤ √n)

The squared prediction error satisfies: |X(θ̂-θ*)|² ≤ 4 sup_{θ ∈ conv(S)} εᵀθ(ω)

for every outcome ω where θ̂(ω) minimizes over B₁.

This is the deterministic reduction step of Theorem 2.4: it reduces the MSE bound to controlling the supremum of a sub-Gaussian process over the polytope XB₁, which is then handled by thm_2_4_tail_bound.

theorem Rigollet.Chapter2.thm_2_4_l1_constrained_ls_tail {n : } {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (f : Ω) (g : Ω(Fin n) →ₗ[] ) (S : Finset (Fin n)) {σsq : } (hsg : vS, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (hdet : ∀ (ω : Ω), θ(convexHull ) S, f ω 4 * (g ω) θ) (t : ) (ht : 0 < t) :
μ {ω : Ω | f ω > t} S.card ENNReal.ofReal (Real.exp (-((t / 4) ^ 2 / (2 * σsq))))

Theorem 2.4 — ℓ₁-constrained LS tail bound (combined).

This combines the deterministic reduction (Theorem 2.4, deterministic step) with the sub-Gaussian tail bound (Theorem 1.16) to obtain the full probabilistic MSE tail bound for ℓ₁-constrained least squares.

From the book (eq. after line 1234): P[|X(θ̂-θ*)|² > t] ≤ P[sup_{v ∈ conv(S)} g(ω)(v) > t/4] ≤ |S| · exp(-(t/4)²/(2σ²))

The proof is:

  1. {ω | f(ω) > t} ⊆ {ω | ∃ θ ∈ conv(S), g(ω)(θ) > t/4} (from deterministic bound)
  2. Measure monotonicity
  3. Application of Theorem 1.16 tail bound

Here f(ω) represents the squared prediction error |X(θ̂(ω)-θ*)|², and g is the linear functional ω ↦ (v ↦ εᵀv). The hypothesis hdet encodes the deterministic step that for each ω, the squared error is at most 4 times the supremum of g over the polytope conv(S).

Concrete Theorem 2.4 with explicit ℓ₁-ball structure #

The following makes the ℓ₁-ball structure, design matrix, column normalization, and specific constants (2d, 32σ²R²) explicit in the theorem statement, faithfully capturing the full content of Theorem 2.4 from Rigollet.

theorem Rigollet.Chapter2.subgaussian_dotProduct_column {n : } {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (σsq : ) (hσsq : 0 σsq) (hmeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) σsq μ) (a : Fin n) (C : ) (hC : i : Fin n, a i ^ 2 C) :
IsSubGaussian (fun (ω : Ω) => ε ω ⬝ᵥ a) (C * σsq) μ

Sub-Gaussian composition (Proposition 1.4 application).

If each coordinate εᵢ is subG(σ²) and the design matrix column X_j satisfies ∑ᵢ (X i j)² ≤ n, then for any vector a ∈ ℝⁿ with ∑ᵢ aᵢ² ≤ C, the random variable εᵀa = ∑ᵢ εᵢaᵢ is subG(C·σ²).

Specifically, for the scaled column R·X_j, we have ∑ᵢ (R·X_{ij})² = R²·∑ᵢ X_{ij}² ≤ R²·n, so εᵀ(R·X_j) ~ subG(R²·n·σ²).

This is Proposition 1.4 from Rigollet (sub-Gaussian linear combination), proved using Theorem 1.6 (sub-Gaussian random vectors).

theorem Rigollet.Chapter2.subgaussian_l1_ball_tail {n d : } (hd : 2 d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (R : ) (hR : 0 < R) (hcol : ∀ (j : Fin d), i : Fin n, X i j ^ 2 n) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (t : ) (ht : 0 < t) :
μ {ω : Ω | ∃ (θ : Fin d), j : Fin d, |θ j| R ε ω ⬝ᵥ X.mulVec θ > t} ENNReal.ofReal (2 * d * Real.exp (-(t ^ 2 / (2 * n * σ ^ 2 * R ^ 2))))

Sub-Gaussian process tail bound over ℓ₁ ball.

If ε ~ subG_n(σ²) (each coordinate is subG(σ²)) and the design matrix X has column normalization ∀ j, ∑ᵢ (X i j)² ≤ n, then for the process εᵀXθ over the ℓ₁ ball {θ : ‖θ‖₁ ≤ R}:

P[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ > t] ≤ 2d · exp(-t² / (2nσ²R²))

This follows from two ingredients:

  1. Sub-Gaussian composition: εᵢ subG(σ²) and |X_j|₂ ≤ √n implies εᵀ(R·X_j) ~ subG(nσ²R²). This is Proposition 1.4 from Rigollet, formalized as subgaussian_dotProduct_column above.
  2. Theorem 1.16 applied to the polytope conv({±R·X_j : j=1,...,d}).

The book combines both in a single sentence: "since ε subG_n(σ²), then for any column X_j such that |X_j|₂ ≤ √n, the random variable εᵀX_j subG(nσ²). Therefore, applying Theorem 1.16..." (line 1232).

The proof uses subgaussian_dotProduct_column (an axiom for Proposition 1.4, whose proof is given in Chapter 1 and not repeated here) together with Theorem 1.16 to bound the supremum of the sub-Gaussian process over the polytope formed by the scaled columns of X.

theorem Rigollet.Chapter2.subgaussian_l1_ball_expectation {n d : } (hn : 0 < n) (hd : 2 d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (R : ) (hR : 0 < R) (hcol : ∀ (j : Fin d), i : Fin n, X i j ^ 2 n) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ}) :
(ω : Ω), sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ} μ σ * R * n * (2 * Real.log (2 * d))

Sub-Gaussian process expectation bound over ℓ₁ ball.

The expectation form of the sub-Gaussian process bound: E[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ] ≤ σ · R · √(n) · √(2 log(2d))

This is the expectation counterpart of subgaussian_l1_ball_tail, obtained by combining:

  1. Sub-Gaussian composition (Proposition 1.4)
  2. Theorem 1.16 expectation form The book derives the E[MSE] bound directly from Theorem 1.16.
theorem Rigollet.Chapter2.thm_2_4_l1_ball_tail {n d : } (hd : 2 d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (R : ) (hR : 0 < R) (hθstar_l1 : j : Fin d, |θstar j| R) (hcol : ∀ (j : Fin d), i : Fin n, X i j ^ 2 n) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : ΩFin d) (hθhat_l1 : ∀ (ω : Ω), j : Fin d, |θhat ω j| R) (hLS : ∀ (ω : Ω) (θ : Fin d), j : Fin d, |θ j| R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (t : ) (ht : 0 < t) :
μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n > t} ENNReal.ofReal (2 * d * Real.exp (-(n * t ^ 2 / (32 * σ ^ 2 * R ^ 2))))

Theorem 2.4 — Concrete MSE tail bound for ℓ₁-constrained LS.

Under the linear model Y = Xθ* + ε with:

  • X : Matrix (Fin n) (Fin d) ℝ the design matrix, d ≥ 2
  • θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
  • Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
  • ε ~ subG_n(σ²), each coordinate sub-Gaussian
  • θ̂ minimizes ‖Y - Xθ‖² over RB₁

Then for any t > 0: P[‖X(θ̂ - θ*)‖₂² / n > t] ≤ 2d · exp(-n · t² / (32 · σ² · R²))

The proof combines:

The sup is divided by n for the MSE normalization, and we substitute t → nt/4 to obtain the final form.

theorem Rigollet.Chapter2.thm_2_4_l1_ball_expectation {n d : } (hn : 0 < n) (hd : 2 d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (R : ) (hR : 0 < R) (hθstar_l1 : j : Fin d, |θstar j| R) (hcol : ∀ (j : Fin d), i : Fin n, X i j ^ 2 n) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : ΩFin d) (hθhat_l1 : ∀ (ω : Ω), j : Fin d, |θhat ω j| R) (hLS : ∀ (ω : Ω) (θ : Fin d), j : Fin d, |θ j| R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hmeas : Measurable fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ}) (hint_mse : MeasureTheory.Integrable (fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar)) μ) (hint_sup : MeasureTheory.Integrable (fun (ω : Ω) => sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ}) μ) :
(ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n μ 4 * σ * R * (2 * Real.log (2 * d) / n)

Theorem 2.4 — E[MSE] bound for ℓ₁-constrained LS.

Under the same assumptions as thm_2_4_l1_ball_tail, the expected MSE satisfies:

E[‖X(θ̂ - θ*)‖₂² / n] ≤ C · σ · R · √(log(2d) / n)

for C = 4√2. The book states this as: E[MSE(X θ̂_{B₁}^LS)] ≲ σ √(log d / n) (for the unit ball R = 1; with general R the bound scales by R).

The proof in the book uses Theorem 1.16 (expectation form): the supremum of the sub-Gaussian process εᵀv over the polytope XK satisfies E[sup_{v ∈ XK} εᵀv] ≤ √(nσ²R²) · √(2 log(2d)) and the deterministic bound |X(θ̂-θ*)|² ≤ 4 · sup_{v ∈ XK} εᵀv gives E[MSE] ≤ 4/(n) · √(nσ²R²) · √(2 log(2d)) = 4σR√(2 log(2d)/n).

The proof combines the deterministic reduction thm_2_4_deterministic_bound with the sub-Gaussian process bound subgaussian_l1_ball_expectation. The sub-Gaussian composition step (εᵢ subG(σ²) → εᵀ(RX_j) subG(nσ²R²)) is encapsulated in subgaussian_l1_ball_expectation (proved using Proposition 1.4 via subgaussian_dotProduct_column and Theorem 1.6).

We additionally need:

  • Integration of the pointwise deterministic bound E[|Xδ|²] ≤ 4 E[sup εᵀXθ] This requires measurability and monotone convergence.
  • Algebraic simplification: 4/n · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n)

These are encapsulated together: the deterministic bound gives |X(θ̂-θ*)|²/n ≤ (4/n) · sup_{K} εᵀXθ pointwise, so taking expectations E[MSE] ≤ (4/n) · E[sup] ≤ (4/n) · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n).

theorem Rigollet.Chapter2.thm_2_4_l1_ball_mse {n d : } (hn : 0 < n) (hd : 2 d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (R : ) (hR : 0 < R) (hθstar_l1 : j : Fin d, |θstar j| R) (hcol : ∀ (j : Fin d), i : Fin n, X i j ^ 2 n) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : ΩFin d) (hθhat_l1 : ∀ (ω : Ω), j : Fin d, |θhat ω j| R) (hLS : ∀ (ω : Ω) (θ : Fin d), j : Fin d, |θ j| R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hmeas : Measurable fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ}) (hint_mse : MeasureTheory.Integrable (fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar)) μ) (hint_sup : MeasureTheory.Integrable (fun (ω : Ω) => sSup {v : | ∃ (θ : Fin d), j : Fin d, |θ j| R v = ε ω ⬝ᵥ X.mulVec θ}) μ) :
(ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n μ 4 * σ * R * (2 * Real.log (2 * d) / n) ∀ (t : ), 0 < tμ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / n > t} ENNReal.ofReal (2 * d * Real.exp (-(n * t ^ 2 / (32 * σ ^ 2 * R ^ 2))))

Theorem 2.4 — Full statement for ℓ₁-constrained LS (Rigollet, Theorem 2.4).

Combines the tail bound and expectation bound into a single theorem. Under the linear model Y = Xθ* + ε with:

  • X : Matrix (Fin n) (Fin d) ℝ, d ≥ 2
  • θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
  • Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
  • ε ~ subG_n(σ²)
  • θ̂ = argmin{‖Y - Xθ‖² : ‖θ‖₁ ≤ R}

Conclusions:

  1. E[MSE] ≤ 4σR√(2 log(2d)/n)
  2. P[MSE > t] ≤ 2d · exp(-nt²/(32σ²R²)) for all t > 0