Theorem 2.4: ℓ₁-Constrained Least Squares MSE Bound #
Main results #
thm_2_4_l1_ball_mse: Theorem 2.4 (full concrete statement). Under the linear model Y = Xθ* + ε with X : Matrix (Fin n) (Fin d) ℝ, ‖θ*‖₁ ≤ R, column normalization ∀ j, ∑ᵢ (X i j)² ≤ n, and ε ~ subG_n(σ²):- E[MSE] ≤ 4σR√(2 log(2d)/n)
- P[MSE > t] ≤ 2d · exp(-nt²/(32σ²R²)) for all t > 0
constrained_fundamental_inequality: Under the linear model Y = Xθ* + ε, if θ̂ minimizes |Y - Xθ|² over θ ∈ K (a constraint set) and θ* ∈ K, then |X(θ̂ - θ*)|² ≤ 2 · εᵀX(θ̂ - θ*). This is the constrained version of the fundamental inequality (equation 2.5 of Rigollet).constrained_sup_bound: Combining the above with the fact that θ̂ - θ* ∈ K - K, we obtain |X(θ̂ - θ*)|² ≤ 2 · sup_{θ ∈ K-K} εᵀXθ.thm_2_4_tail_bound: For K = B₁ (unit ℓ₁ ball), assuming column normalization max_j |X_j|₂ ≤ √n and ε ~ subG_n(σ²), the tail bound: P[sup_{v ∈ XB₁} εᵀv > t] ≤ 2d · exp(-t²/(2nσ²))This is the key probabilistic ingredient of Theorem 2.4. Combined with the constrained fundamental inequality, it yields: P[MSE(X θ̂_{B₁}^LS) > t] ≤ 2d · exp(-nt²/(32σ²))
Proof strategy #
The proof of the constrained fundamental inequality is purely algebraic:
- Minimality over K at θ = θ* gives |Y - Xθ̂|² ≤ |ε|²
- Expanding |Y - Xθ̂|² = |ε - X(θ̂-θ*)|² = |ε|² - 2εᵀX(θ̂-θ*) + |X(θ̂-θ*)|²
- Combining: |X(θ̂-θ*)|² ≤ 2εᵀX(θ̂-θ*)
For the ℓ₁ ball, since B₁ is a symmetric convex polytope with 2d vertices ±e_j:
- K - K = 2K (symmetry), so sup over K-K = 2 sup over K
- XK is a polytope with at most 2d vertices ±X_j
- Applying Theorem 1.16 to the polytope XK gives the tail bound
References #
- Rigollet, High-Dimensional Statistics, Theorem 2.4
Helper: monotonicity of IsSubGaussian in the variance proxy.
Constrained Fundamental Inequality #
Constrained Fundamental Inequality (algebraic core).
Under the linear model Y = Xθ* + ε, if θ̂ minimizes the squared residual |Y - Xθ|² = (Y - Xθ) ⬝ᵥ (Y - Xθ) over θ ∈ K, and θ* ∈ K, then
|X(θ̂ - θ*)|² ≤ 2 · εᵀ X(θ̂ - θ*)
This generalizes thm_2_2_fundamental_inequality to constrained estimation.
The proof is identical: minimality at θ* + expansion + simplification.
Constrained Fundamental Inequality with sup over K-K.
If θ̂ minimizes |Y - Xθ|² over K with θ̂, θ* ∈ K, then |X(θ̂ - θ*)|² ≤ 2 · sup_{θ ∈ K-K} εᵀXθ.
Since θ̂ - θ* ∈ K - K, the inner product εᵀX(θ̂-θ*) is bounded by the supremum over K - K.
Tail bound for ℓ₁-constrained LS via Theorem 1.16 #
For K = B₁ (unit ℓ₁ ball), XK is a polytope with at most 2d vertices ±X_j.
If ε subG_n(σ²) and |X_j|₂ ≤ √n, then εᵀX_j subG(nσ²), so by
Theorem 1.16:
P(sup_{v ∈ XK} εᵀv > t) ≤ 2d · exp(-t²/(2nσ²))
Setting t = nt'/4, this gives P(MSE > t') ≤ 2d · exp(-nt'²/(32σ²)).
Theorem 2.4 — Tail bound for ℓ₁-constrained LS (Theorem 1.16 application).
Let S be the finset of signed columns {X_1, -X_1, ..., X_d, -X_d} ⊂ ℝⁿ, which are the (potential) vertices of XB₁. If εᵀv is subG(σ²) for each vertex v ∈ S, then for any t > 0:
P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ |S| · exp(-t²/(2σ²))
This is a direct application of Theorem 1.16. When |S| = 2d and σ² is replaced by nσ² (from column normalization), it gives the probabilistic component of Theorem 2.4.
Theorem 2.4 — Tail bound (alternative form with ENNReal.ofReal).
P(sup_{θ ∈ conv(S)} εᵀθ > t) ≤ ENNReal.ofReal(|S| · exp(-t²/(2σ²)))
Full Theorem 2.4 statement #
The full theorem combines:
- Constrained fundamental inequality: |X(θ̂-θ*)|² ≤ 4 sup_{v ∈ XK} εᵀv
- Tail bound: P(sup > nt/4) ≤ 2d exp(-nt²/(32σ²))
We state the combined result: the MSE tail bound P[MSE > t] ≤ 2d exp(-nt²/(32σ²)).
Theorem 2.4 — MSE tail bound for ℓ₁-constrained LS.
Under the linear model Y = Xθ* + ε with:
- θ̂ minimizing |Y-Xθ|² over B₁ (and θ* ∈ B₁)
- S = {±X_1, ..., ±X_d} the signed columns of X (vertices of XB₁)
- εᵀv subG(nσ²) for each vertex v ∈ S (from column normalization |X_j|₂ ≤ √n)
The squared prediction error satisfies: |X(θ̂-θ*)|² ≤ 4 sup_{θ ∈ conv(S)} εᵀθ(ω)
for every outcome ω where θ̂(ω) minimizes over B₁.
This is the deterministic reduction step of Theorem 2.4: it reduces the
MSE bound to controlling the supremum of a sub-Gaussian process over the
polytope XB₁, which is then handled by thm_2_4_tail_bound.
Theorem 2.4 — ℓ₁-constrained LS tail bound (combined).
This combines the deterministic reduction (Theorem 2.4, deterministic step) with the sub-Gaussian tail bound (Theorem 1.16) to obtain the full probabilistic MSE tail bound for ℓ₁-constrained least squares.
From the book (eq. after line 1234): P[|X(θ̂-θ*)|² > t] ≤ P[sup_{v ∈ conv(S)} g(ω)(v) > t/4] ≤ |S| · exp(-(t/4)²/(2σ²))
The proof is:
- {ω | f(ω) > t} ⊆ {ω | ∃ θ ∈ conv(S), g(ω)(θ) > t/4} (from deterministic bound)
- Measure monotonicity
- Application of Theorem 1.16 tail bound
Here f(ω) represents the squared prediction error |X(θ̂(ω)-θ*)|², and
g is the linear functional ω ↦ (v ↦ εᵀv). The hypothesis hdet encodes
the deterministic step that for each ω, the squared error is at most
4 times the supremum of g over the polytope conv(S).
Concrete Theorem 2.4 with explicit ℓ₁-ball structure #
The following makes the ℓ₁-ball structure, design matrix, column normalization, and specific constants (2d, 32σ²R²) explicit in the theorem statement, faithfully capturing the full content of Theorem 2.4 from Rigollet.
Sub-Gaussian composition (Proposition 1.4 application).
If each coordinate εᵢ is subG(σ²) and the design matrix column X_j satisfies ∑ᵢ (X i j)² ≤ n, then for any vector a ∈ ℝⁿ with ∑ᵢ aᵢ² ≤ C, the random variable εᵀa = ∑ᵢ εᵢaᵢ is subG(C·σ²).
Specifically, for the scaled column R·X_j, we have ∑ᵢ (R·X_{ij})² = R²·∑ᵢ X_{ij}² ≤ R²·n, so εᵀ(R·X_j) ~ subG(R²·n·σ²).
This is Proposition 1.4 from Rigollet (sub-Gaussian linear combination), proved using Theorem 1.6 (sub-Gaussian random vectors).
Sub-Gaussian process tail bound over ℓ₁ ball.
If ε ~ subG_n(σ²) (each coordinate is subG(σ²)) and the design matrix X has column normalization ∀ j, ∑ᵢ (X i j)² ≤ n, then for the process εᵀXθ over the ℓ₁ ball {θ : ‖θ‖₁ ≤ R}:
P[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ > t] ≤ 2d · exp(-t² / (2nσ²R²))
This follows from two ingredients:
- Sub-Gaussian composition: εᵢ subG(σ²) and |X_j|₂ ≤ √n implies
εᵀ(R·X_j) ~ subG(nσ²R²). This is Proposition 1.4 from Rigollet,
formalized as
subgaussian_dotProduct_columnabove. - Theorem 1.16 applied to the polytope conv({±R·X_j : j=1,...,d}).
The book combines both in a single sentence: "since ε subG_n(σ²), then
for any column X_j such that |X_j|₂ ≤ √n, the random variable εᵀX_j
subG(nσ²). Therefore, applying Theorem 1.16..." (line 1232).
The proof uses subgaussian_dotProduct_column (an axiom for Proposition 1.4,
whose proof is given in Chapter 1 and not repeated here) together with
Theorem 1.16 to bound the supremum of the sub-Gaussian process over the
polytope formed by the scaled columns of X.
Sub-Gaussian process expectation bound over ℓ₁ ball.
The expectation form of the sub-Gaussian process bound: E[sup_{θ : ‖θ‖₁ ≤ R} εᵀXθ] ≤ σ · R · √(n) · √(2 log(2d))
This is the expectation counterpart of subgaussian_l1_ball_tail, obtained by
combining:
- Sub-Gaussian composition (Proposition 1.4)
- Theorem 1.16 expectation form The book derives the E[MSE] bound directly from Theorem 1.16.
Theorem 2.4 — Concrete MSE tail bound for ℓ₁-constrained LS.
Under the linear model Y = Xθ* + ε with:
- X : Matrix (Fin n) (Fin d) ℝ the design matrix, d ≥ 2
- θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
- Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
- ε ~ subG_n(σ²), each coordinate sub-Gaussian
- θ̂ minimizes ‖Y - Xθ‖² over RB₁
Then for any t > 0: P[‖X(θ̂ - θ*)‖₂² / n > t] ≤ 2d · exp(-n · t² / (32 · σ² · R²))
The proof combines:
- Deterministic: |X(θ̂-θ*)|² ≤ 4 sup_{θ∈K} εᵀXθ (from
thm_2_4_deterministic_bound) - Probabilistic: P[sup > nt/4] ≤ 2d exp(-nt²/(32σ²R²)) (from
subgaussian_l1_ball_tail)
The sup is divided by n for the MSE normalization, and we substitute t → nt/4 to obtain the final form.
Theorem 2.4 — E[MSE] bound for ℓ₁-constrained LS.
Under the same assumptions as thm_2_4_l1_ball_tail, the expected MSE satisfies:
E[‖X(θ̂ - θ*)‖₂² / n] ≤ C · σ · R · √(log(2d) / n)
for C = 4√2. The book states this as: E[MSE(X θ̂_{B₁}^LS)] ≲ σ √(log d / n) (for the unit ball R = 1; with general R the bound scales by R).
The proof in the book uses Theorem 1.16 (expectation form): the supremum of the sub-Gaussian process εᵀv over the polytope XK satisfies E[sup_{v ∈ XK} εᵀv] ≤ √(nσ²R²) · √(2 log(2d)) and the deterministic bound |X(θ̂-θ*)|² ≤ 4 · sup_{v ∈ XK} εᵀv gives E[MSE] ≤ 4/(n) · √(nσ²R²) · √(2 log(2d)) = 4σR√(2 log(2d)/n).
The proof combines the deterministic reduction thm_2_4_deterministic_bound with
the sub-Gaussian process bound subgaussian_l1_ball_expectation. The sub-Gaussian
composition step (εᵢ subG(σ²) → εᵀ(RX_j) subG(nσ²R²)) is encapsulated in
subgaussian_l1_ball_expectation (proved using Proposition 1.4 via
subgaussian_dotProduct_column and Theorem 1.6).
We additionally need:
- Integration of the pointwise deterministic bound E[|Xδ|²] ≤ 4 E[sup εᵀXθ] This requires measurability and monotone convergence.
- Algebraic simplification: 4/n · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n)
These are encapsulated together: the deterministic bound gives |X(θ̂-θ*)|²/n ≤ (4/n) · sup_{K} εᵀXθ pointwise, so taking expectations E[MSE] ≤ (4/n) · E[sup] ≤ (4/n) · σR√n · √(2 log(2d)) = 4σR√(2 log(2d)/n).
Theorem 2.4 — Full statement for ℓ₁-constrained LS (Rigollet, Theorem 2.4).
Combines the tail bound and expectation bound into a single theorem. Under the linear model Y = Xθ* + ε with:
- X : Matrix (Fin n) (Fin d) ℝ, d ≥ 2
- θ* ∈ RB₁ (i.e., ‖θ*‖₁ ≤ R)
- Column normalization: ∀ j, ∑ᵢ (X i j)² ≤ n
- ε ~ subG_n(σ²)
- θ̂ = argmin{‖Y - Xθ‖² : ‖θ‖₁ ≤ R}
Conclusions:
- E[MSE] ≤ 4σR√(2 log(2d)/n)
- P[MSE > t] ≤ 2d · exp(-nt²/(32σ²R²)) for all t > 0