Theorem 2.4: ℓ₁-Constrained Least Squares MSE Bound #

Main results #

thm_2_4_l1_ball_mse: Theorem 2.4 (full concrete statement). Under the linear model Y = Xθ* + ε with X : Matrix (Fin n) (Fin d) ℝ, ‖θ*‖₁ ≤ R, column normalization ∀ j, ∑ᵢ (X i j)² ≤ n, and ε ~ subG_n(σ²):
1. E[MSE] ≤ 4σR√(2 log(2d)/n)
2. P[MSE > t] ≤ 2d · exp(-nt²/(32σ²R²)) for all t > 0
constrained_fundamental_inequality: Under the linear model Y = Xθ* + ε, if θ̂ minimizes |Y - Xθ|² over θ ∈ K (a constraint set) and θ* ∈ K, then |X(θ̂ - θ*)|² ≤ 2 · εᵀX(θ̂ - θ*). This is the constrained version of the fundamental inequality (equation 2.5 of Rigollet).
constrained_sup_bound: Combining the above with the fact that θ̂ - θ* ∈ K - K, we obtain |X(θ̂ - θ*)|² ≤ 2 · sup_{θ ∈ K-K} εᵀXθ.
thm_2_4_tail_bound: For K = B₁ (unit ℓ₁ ball), assuming column normalization max_j |X_j|₂ ≤ √n and ε ~ subG_n(σ²), the tail bound: P[sup_{v ∈ XB₁} εᵀv > t] ≤ 2d · exp(-t²/(2nσ²))
This is the key probabilistic ingredient of Theorem 2.4. Combined with the constrained fundamental inequality, it yields: P[MSE(X θ̂_{B₁}^LS) > t] ≤ 2d · exp(-nt²/(32σ²))

Proof strategy #

The proof of the constrained fundamental inequality is purely algebraic:

Minimality over K at θ = θ* gives |Y - Xθ̂|² ≤ |ε|²
Expanding |Y - Xθ̂|² = |ε - X(θ̂-θ*)|² = |ε|² - 2εᵀX(θ̂-θ*) + |X(θ̂-θ*)|²
Combining: |X(θ̂-θ*)|² ≤ 2εᵀX(θ̂-θ*)

For the ℓ₁ ball, since B₁ is a symmetric convex polytope with 2d vertices ±e_j:

K - K = 2K (symmetry), so sup over K-K = 2 sup over K
XK is a polytope with at most 2d vertices ±X_j
Applying Theorem 1.16 to the polytope XK gives the tail bound

References #

Rigollet, High-Dimensional Statistics, Theorem 2.4

source

theorem IsSubGaussian.mono_variance {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {f : Ω → ℝ} {σsq₁ σsq₂ : ℝ} (h : IsSubGaussian f σsq₁ μ) (hle : σsq₁ ≤ σsq₂) :

IsSubGaussian f σsq₂ μ

Helper: monotonicity of IsSubGaussian in the variance proxy.

Constrained Fundamental Inequality #

source

theorem Rigollet.Chapter2.constrained_fundamental_inequality {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (θhat : Fin d → ℝ) (K : Set (Fin d → ℝ)) (hθstar_in_K : θstar ∈ K) (hLS : ∀ θ ∈ K, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) ≤ (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) :

X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) ≤ 2 * ε ⬝ᵥ X.mulVec (θhat - θstar)

Constrained Fundamental Inequality (algebraic core).

Under the linear model Y = Xθ* + ε, if θ̂ minimizes the squared residual |Y - Xθ|² = (Y - Xθ) ⬝ᵥ (Y - Xθ) over θ ∈ K, and θ* ∈ K, then

|X(θ̂ - θ*)|² ≤ 2 · εᵀ X(θ̂ - θ*)

This generalizes thm_2_2_fundamental_inequality to constrained estimation. The proof is identical: minimality at θ* + expansion + simplification.

source

theorem Rigollet.Chapter2.constrained_sup_bound {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (θhat : Fin d → ℝ) (K : Set (Fin d → ℝ)) (hθstar_in_K : θstar ∈ K) (hθhat_in_K : θhat ∈ K) (hLS : ∀ θ ∈ K, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) ≤ (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) (hbdd : BddAbove ((fun (θ : Fin d → ℝ) => ε ⬝ᵥ X.mulVec θ) '' (K - K))) :

X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) ≤ 2 * sSup ((fun (θ : Fin d → ℝ) => ε ⬝ᵥ X.mulVec θ) '' (K - K))

Constrained Fundamental Inequality with sup over K-K.

If θ̂ minimizes |Y - Xθ|² over K with θ̂, θ* ∈ K, then |X(θ̂ - θ*)|² ≤ 2 · sup_{θ ∈ K-K} εᵀXθ.

Since θ̂ - θ* ∈ K - K, the inner product εᵀX(θ̂-θ*) is bounded by the supremum over K - K.

Tail bound for ℓ₁-constrained LS via Theorem 1.16 #

For K = B₁ (unit ℓ₁ ball), XK is a polytope with at most 2d vertices ±X_j. If ε ~~subG_n(σ²) and |X_j|₂ ≤ √n, then εᵀX_j~~ subG(nσ²), so by Theorem 1.16:

P(sup_{v ∈ XK} εᵀv > t) ≤ 2d · exp(-t²/(2nσ²))

Setting t = nt'/4, this gives P(MSE > t') ≤ 2d · exp(-nt'²/(32σ²)).

source

theorem Rigollet.Chapter2.thm_2_4_tail_bound {n : ℕ} {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (g : Ω → (Fin n → ℝ) →ₗ[ℝ ] ℝ) (S : Finset (Fin n → ℝ)) {σsq : ℝ} (hsg : ∀ v ∈ S, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | ∃ θ ∈ (convexHull ℝ) ↑S, (g ω) θ > t} ≤ S.card • ENNReal.ofReal (Real.exp (-(t ^ 2 / (2 * σsq))))

Theorem 2.4 — Tail bound for ℓ₁-constrained LS (Theorem 1.16 application).

Let S be the finset of signed columns {X_1, -X_1, ..., X_d, -X_d} ⊂ ℝⁿ, which are the (potential) vertices of XB₁. If εᵀv is subG(σ²) for each vertex v ∈ S, then for any t > 0:

P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ |S| · exp(-t²/(2σ²))

This is a direct application of Theorem 1.16. When |S| = 2d and σ² is replaced by nσ² (from column normalization), it gives the probabilistic component of Theorem 2.4.

source

theorem Rigollet.Chapter2.thm_2_4_tail_bound' {n : ℕ} {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (g : Ω → (Fin n → ℝ) →ₗ[ℝ ] ℝ) (S : Finset (Fin n → ℝ)) {σsq : ℝ} (hsg : ∀ v ∈ S, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | ∃ θ ∈ (convexHull ℝ) ↑S, (g ω) θ > t} ≤ ENNReal.ofReal (↑S.card * Real.exp (-(t ^ 2 / (2 * σsq))))

Theorem 2.4 — Tail bound (alternative form with ENNReal.ofReal).

P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ ENNReal.ofReal(|S| · exp(-t²/(2σ²)))

Full Theorem 2.4 statement #

The full theorem combines:

Constrained fundamental inequality: |X(θ̂-θ*)|² ≤ 4 sup_{v ∈ XK} εᵀv
Tail bound: P(sup > nt/4) ≤ 2d exp(-nt²/(32σ²))

We state the combined result: the MSE tail bound P[MSE > t] ≤ 2d exp(-nt²/(32σ²)).

source

theorem Rigollet.Chapter2.thm_2_4_deterministic_bound {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (θhat : Fin d → ℝ) (K : Set (Fin d → ℝ)) (hθstar_in_K : θstar ∈ K) (hθhat_in_K : θhat ∈ K) (hK_symm : ∀ (θ : Fin d → ℝ), θ ∈ K ↔ -θ ∈ K) (hK_convex : Convex ℝ K) (hLS : ∀ θ ∈ K, (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) ≤ (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) (hbdd : BddAbove ((fun (θ : Fin d → ℝ) => ε ⬝ᵥ X.mulVec θ) '' K)) :

X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) ≤ 4 * sSup ((fun (θ : Fin d → ℝ) => ε ⬝ᵥ X.mulVec θ) '' K)

Theorem 2.4 — MSE tail bound for ℓ₁-constrained LS.

Under the linear model Y = Xθ* + ε with:

θ̂ minimizing |Y-Xθ|² over B₁ (and θ* ∈ B₁)
S = {±X_1, ..., ±X_d} the signed columns of X (vertices of XB₁)
εᵀv subG(nσ²) for each vertex v ∈ S (from column normalization |X_j|₂ ≤ √n)

The squared prediction error satisfies: |X(θ̂-θ*)|² ≤ 4 sup_{θ ∈ conv(S)} εᵀθ(ω)

for every outcome ω where θ̂(ω) minimizes over B₁.

This is the deterministic reduction step of Theorem 2.4: it reduces the MSE bound to controlling the supremum of a sub-Gaussian process over the polytope XB₁, which is then handled by thm_2_4_tail_bound.

source

theorem Rigollet.Chapter2.thm_2_4_l1_constrained_ls_tail {n : ℕ} {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (f : Ω → ℝ) (g : Ω → (Fin n → ℝ) →ₗ[ℝ ] ℝ) (S : Finset (Fin n → ℝ)) {σsq : ℝ} (hsg : ∀ v ∈ S, IsSubGaussian (fun (ω : Ω) => (g ω) v) σsq μ) (hdet : ∀ (ω : Ω), ∃ θ ∈ (convexHull ℝ) ↑S, f ω ≤ 4 * (g ω) θ) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | f ω > t} ≤ S.card • ENNReal.ofReal (Real.exp (-((t / 4) ^ 2 / (2 * σsq))))

Theorem 2.4 — ℓ₁-constrained LS tail bound (combined).

This combines the deterministic reduction (Theorem 2.4, deterministic step) with the sub-Gaussian tail bound (Theorem 1.16) to obtain the full probabilistic MSE tail bound for ℓ₁-constrained least squares.

From the book (eq. after line 1234): P[|X(θ̂-θ*)|² > t] ≤ P[sup_{v ∈ conv(S)} g(ω)(v) > t/4] ≤ |S| · exp(-(t/4)²/(2σ²))

The proof is:

{ω | f(ω) > t} ⊆ {ω | ∃ θ ∈ conv(S), g(ω)(θ) > t/4} (from deterministic bound)
Measure monotonicity
Application of Theorem 1.16 tail bound

Here f(ω) represents the squared prediction error |X(θ̂(ω)-θ*)|², and g is the linear functional ω ↦ (v ↦ εᵀv). The hypothesis hdet encodes the deterministic step that for each ω, the squared error is at most 4 times the supremum of g over the polytope conv(S).

Concrete Theorem 2.4 with explicit ℓ₁-ball structure #

The following makes the ℓ₁-ball structure, design matrix, column normalization, and specific constants (2d, 32σ²R²) explicit in the theorem statement, faithfully capturing the full content of Theorem 2.4 from Rigollet.

source

theorem Rigollet.Chapter2.subgaussian_dotProduct_column {n : ℕ} {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (ε : Ω → Fin n → ℝ) (σsq : ℝ) (hσsq : 0 ≤ σsq) (hmeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) σsq μ) (a : Fin n → ℝ) (C : ℝ) (hC : ∑ i : Fin n, a i ^ 2 ≤ C) :

IsSubGaussian (fun (ω : Ω) => ε ω ⬝ᵥ a) (C * σsq) μ

Sub-Gaussian composition (Proposition 1.4 application).

If each coordinate εᵢ is subG(σ²) and the design matrix column X_j satisfies ∑ᵢ (X i j)² ≤ n, then for any vector a ∈ ℝⁿ with ∑ᵢ aᵢ² ≤ C, the random variable εᵀa = ∑ᵢ εᵢaᵢ is subG(C·σ²).

Specifically, for the scaled column R·X_j, we have ∑ᵢ (R·X_{ij})² = R²·∑ᵢ X_{ij}² ≤ R²·n, so εᵀ(R·X_j) ~ subG(R²·n·σ²).

This is Proposition 1.4 from Rigollet (sub-Gaussian linear combination), proved using Theorem 1.6 (sub-Gaussian random vectors).

source

theorem Rigollet.Chapter2.subgaussian_l1_ball_tail {n d : ℕ} (hd : 2 ≤ d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (R : ℝ) (hR : 0 < R) (hcol : ∀ (j : Fin d), ∑ i : Fin n, X i j ^ 2 ≤ ↑n) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ ε ω ⬝ᵥ X.mulVec θ > t} ≤ ENNReal.ofReal (2 * ↑d * Real.exp (-(t ^ 2 / (2 * ↑n * σ ^ 2 * R ^ 2))))

Sub-Gaussian process tail bound over ℓ₁ ball.

If ε ~ subG_n(σ²) (each coordinate is subG(σ²)) and the design matrix X has column normalization ∀ j, ∑ᵢ (X i j)² ≤ n, then for the process εᵀXθ over the ℓ₁ ball {θ : ‖θ‖₁ ≤ R}:

P[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ > t] ≤ 2d · exp(-t² / (2nσ²R²))

This follows from two ingredients:

Sub-Gaussian composition: εᵢ subG(σ²) and |X_j|₂ ≤ √n implies εᵀ(R·X_j) ~ subG(nσ²R²). This is Proposition 1.4 from Rigollet, formalized as subgaussian_dotProduct_column above.
Theorem 1.16 applied to the polytope conv({±R·X_j : j=1,...,d}).

The book combines both in a single sentence: "since ε ~~subG_n(σ²), then for any column X_j such that |X_j|₂ ≤ √n, the random variable εᵀX_j~~ subG(nσ²). Therefore, applying Theorem 1.16..." (line 1232).

The proof uses subgaussian_dotProduct_column (an axiom for Proposition 1.4, whose proof is given in Chapter 1 and not repeated here) together with Theorem 1.16 to bound the supremum of the sub-Gaussian process over the polytope formed by the scaled columns of X.

source

theorem Rigollet.Chapter2.subgaussian_l1_ball_expectation {n d : ℕ} (hn : 0 < n) (hd : 2 ≤ d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (R : ℝ) (hR : 0 < R) (hcol : ∀ (j : Fin d), ∑ i : Fin n, X i j ^ 2 ≤ ↑n) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ}) :

∫ (ω : Ω), sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ} ∂μ ≤ σ * R * √↑n * √(2 * Real.log (2 * ↑d))

Sub-Gaussian process expectation bound over ℓ₁ ball.

The expectation form of the sub-Gaussian process bound: E[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ] ≤ σ · R · √(n) · √(2 log(2d))

This is the expectation counterpart of subgaussian_l1_ball_tail, obtained by combining:

Sub-Gaussian composition (Proposition 1.4)
Theorem 1.16 expectation form The book derives the E[MSE] bound directly from Theorem 1.16.

source

theorem Rigollet.Chapter2.thm_2_4_l1_ball_tail {n d : ℕ} (hd : 2 ≤ d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (R : ℝ) (hR : 0 < R) (hθstar_l1 : ∑ j : Fin d, |θstar j| ≤ R) (hcol : ∀ (j : Fin d), ∑ i : Fin n, X i j ^ 2 ≤ ↑n) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : Ω → Fin d → ℝ) (hθhat_l1 : ∀ (ω : Ω), ∑ j : Fin d, |θhat ω j| ≤ R) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n > t} ≤ ENNReal.ofReal (2 * ↑d * Real.exp (-(↑n * t ^ 2 / (32 * σ ^ 2 * R ^ 2))))

Theorem 2.4 — Concrete MSE tail bound for ℓ₁-constrained LS.

Under the linear model Y = Xθ* + ε with:

X : Matrix (Fin n) (Fin d) ℝ the design matrix, d ≥ 2
θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
ε ~ subG_n(σ²), each coordinate sub-Gaussian
θ̂ minimizes ‖Y - Xθ‖² over RB₁

Then for any t > 0: P[‖X(θ̂ - θ*)‖₂² / n > t] ≤ 2d · exp(-n · t² / (32 · σ² · R²))

The proof combines:

Deterministic: |X(θ̂-θ*)|² ≤ 4 sup_{θ∈K} εᵀXθ (from thm_2_4_deterministic_bound)
Probabilistic: P[sup > nt/4] ≤ 2d exp(-nt²/(32σ²R²)) (from subgaussian_l1_ball_tail)

The sup is divided by n for the MSE normalization, and we substitute t → nt/4 to obtain the final form.

source

theorem Rigollet.Chapter2.thm_2_4_l1_ball_expectation {n d : ℕ} (hn : 0 < n) (hd : 2 ≤ d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (R : ℝ) (hR : 0 < R) (hθstar_l1 : ∑ j : Fin d, |θstar j| ≤ R) (hcol : ∀ (j : Fin d), ∑ i : Fin n, X i j ^ 2 ≤ ↑n) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : Ω → Fin d → ℝ) (hθhat_l1 : ∀ (ω : Ω), ∑ j : Fin d, |θhat ω j| ≤ R) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hmeas : Measurable fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ}) (hint_mse : MeasureTheory.Integrable (fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar)) μ) (hint_sup : MeasureTheory.Integrable (fun (ω : Ω) => sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ}) μ) :

∫ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n ∂μ ≤ 4 * σ * R * √(2 * Real.log (2 * ↑d) / ↑n)

Theorem 2.4 — E[MSE] bound for ℓ₁-constrained LS.

Under the same assumptions as thm_2_4_l1_ball_tail, the expected MSE satisfies:

E[‖X(θ̂ - θ*)‖₂² / n] ≤ C · σ · R · √(log(2d) / n)

for C = 4√2. The book states this as: E[MSE(X θ̂_{B₁}^LS)] ≲ σ √(log d / n) (for the unit ball R = 1; with general R the bound scales by R).

The proof in the book uses Theorem 1.16 (expectation form): the supremum of the sub-Gaussian process εᵀv over the polytope XK satisfies E[sup_{v ∈ XK} εᵀv] ≤ √(nσ²R²) · √(2 log(2d)) and the deterministic bound |X(θ̂-θ*)|² ≤ 4 · sup_{v ∈ XK} εᵀv gives E[MSE] ≤ 4/(n) · √(nσ²R²) · √(2 log(2d)) = 4σR√(2 log(2d)/n).

The proof combines the deterministic reduction thm_2_4_deterministic_bound with the sub-Gaussian process bound subgaussian_l1_ball_expectation. The sub-Gaussian composition step (εᵢ subG(σ²) → εᵀ(RX_j) subG(nσ²R²)) is encapsulated in subgaussian_l1_ball_expectation (proved using Proposition 1.4 via subgaussian_dotProduct_column and Theorem 1.6).

We additionally need:

Integration of the pointwise deterministic bound E[|Xδ|²] ≤ 4 E[sup εᵀXθ] This requires measurability and monotone convergence.
Algebraic simplification: 4/n · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n)

These are encapsulated together: the deterministic bound gives |X(θ̂-θ*)|²/n ≤ (4/n) · sup_{K} εᵀXθ pointwise, so taking expectations E[MSE] ≤ (4/n) · E[sup] ≤ (4/n) · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n).

source

theorem Rigollet.Chapter2.thm_2_4_l1_ball_mse {n d : ℕ} (hn : 0 < n) (hd : 2 ≤ d) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (R : ℝ) (hR : 0 < R) (hθstar_l1 : ∑ j : Fin d, |θstar j| ≤ R) (hcol : ∀ (j : Fin d), ∑ i : Fin n, X i j ^ 2 ≤ ↑n) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hε : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hmeas_ε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hindep_ε : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (θhat : Ω → Fin d → ℝ) (hθhat_l1 : ∀ (ω : Ω), ∑ j : Fin d, |θhat ω j| ≤ R) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hmeas : Measurable fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n) (hmeas_sup : Measurable fun (ω : Ω) => sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ}) (hint_mse : MeasureTheory.Integrable (fun (ω : Ω) => X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar)) μ) (hint_sup : MeasureTheory.Integrable (fun (ω : Ω) => sSup {v : ℝ | ∃ (θ : Fin d → ℝ), ∑ j : Fin d, |θ j| ≤ R ∧ v = ε ω ⬝ᵥ X.mulVec θ}) μ) :

∫ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n ∂μ ≤ 4 * σ * R * √(2 * Real.log (2 * ↑d) / ↑n) ∧ ∀ (t : ℝ), 0 < t → μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) / ↑n > t} ≤ ENNReal.ofReal (2 * ↑d * Real.exp (-(↑n * t ^ 2 / (32 * σ ^ 2 * R ^ 2))))

Theorem 2.4 — Full statement for ℓ₁-constrained LS (Rigollet, Theorem 2.4).

Combines the tail bound and expectation bound into a single theorem. Under the linear model Y = Xθ* + ε with:

X : Matrix (Fin n) (Fin d) ℝ, d ≥ 2
θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
ε ~ subG_n(σ²)
θ̂ = argmin{‖Y - Xθ‖² : ‖θ‖₁ ≤ R}

Conclusions:

E[MSE] ≤ 4σR√(2 log(2d)/n)
P[MSE > t] ≤ 2d · exp(-nt²/(32σ²R²)) for all t > 0

Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_4

Theorem 2.4: ℓ₁-Constrained Least Squares MSE Bound #

Main results #

Proof strategy #

References #

Constrained Fundamental Inequality #

Tail bound for ℓ₁-constrained LS via Theorem 1.16 #

Full Theorem 2.4 statement #

Concrete Theorem 2.4 with explicit ℓ₁-ball structure #