Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Cor_2_8

Corollary 2.8: Sparse MSE bound for ℓ₀-constrained LS #

From Rigollet, Chapter 2, Corollary 2.8:

Under the assumptions of Theorem 2.6, for any δ > 0, with probability 1 - δ, it holds

$$\mathsf{MSE}(\mathbb{X}\hat{\theta}_{\mathcal{B}_0(k)}^{\mathrm{LS}}) \lesssim \frac{\sigma^2 k}{n} \log\left(\frac{ed}{2k}\right) + \frac{\sigma^2 k}{n} \log(6) + \frac{\sigma^2}{n} \log(1/\delta) \,.$$

Proof #

This follows immediately from the proof of Theorem 2.6 by applying Lemma 2.7 to bound log C(d,2k) ≤ 2k · log(ed/(2k)). Specifically, the tail bound (2.6)

P(|X(θ̂-θ*)|² > 4t) ≤ C(d,2k) · 6^{2k} · exp(-t/(8σ²))

requires t ≥ Cσ² {log C(d,2k) + k·log(6) + log(1/δ)} for the RHS ≤ δ. By Lemma 2.7, C(d,2k) ≤ (ed/(2k))^{2k}, so log C(d,2k) ≤ 2k·log(ed/(2k)). Plugging in t = 8σ²·L with L = 2k·log(ed/(2k)) + 2k·log(6) + log(1/δ), we get |X(θ̂-θ*)|² ≤ 4t = 32σ²·L, hence dividing by n gives the MSE bound.

theorem cor_2_8_mse_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (θhat : ΩFin d) (k : ) (hk : 1 k) (hkd : 2 * k d) (σ : ) ( : 0 < σ) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hhat_sparse : ∀ (ω : Ω), {j : Fin d | θhat ω j 0}.card k) (hstar_sparse : {j : Fin d | θstar j 0}.card k) (hLS : ∀ (ω : Ω) (θ : Fin d), {j : Fin d | θ j 0}.card k → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n), v ⬝ᵥ v 1∀ (s : ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) μ Real.exp (s ^ 2 * σ ^ 2 / 2)) :
μ {ω : Ω | 1 / n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) 32 * σ ^ 2 / n * (2 * k * Real.log (Real.exp 1 * d / (2 * k)) + 2 * k * Real.log 6 + Real.log (1 / δ))} ENNReal.ofReal (1 - δ)

Corollary 2.8 (Rigollet): Sparse MSE bound for ℓ₀-constrained LS with Lemma 2.7 applied.

Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), with θ̂ the ℓ₀-constrained LS estimator over k-sparse vectors and θ* ∈ B₀(k), for any δ ∈ (0,1], with probability at least 1 - δ:

MSE(Xθ̂) = (1/n)|Xθ̂ - Xθ*|₂² ≤ (32σ²/n)(2k·log(ed/(2k)) + 2k·log 6 + log(1/δ))

This is Corollary 2.8 of Rigollet's textbook, which applies Lemma 2.7 (binomial coefficient bound C(d,2k) ≤ (ed/(2k))^{2k}) to simplify the tail bound from Theorem 2.6 equation (2.6). The key observation is that log C(d,2k) ≤ 2k·log(ed/(2k)), yielding the rate:

MSE ≲ σ²k/n · log(ed/(2k)) + σ²k/n · log(6) + σ²/n · log(1/δ)

which for fixed δ gives the minimax-optimal rate σ²k·log(d/k)/n.

theorem cor_2_8_sparse_rate (n d k : ) (hn : 0 < n) (hk : 1 k) (hkd : 2 * k d) (σsq δ : ) ( : 0 < σsq) ( : 0 < δ) (sq_pred_err : ) (hbound : sq_pred_err 4 * (8 * σsq * (2 * k * Real.log (Real.exp 1 * d / (2 * k)) + 2 * k * Real.log 6 + Real.log (1 / δ)))) :
sq_pred_err / n 32 * σsq / n * (2 * k * Real.log (Real.exp 1 * d / (2 * k)) + 2 * k * Real.log 6 + Real.log (1 / δ))

Corollary 2.8, deterministic form (algebraic step).

If the squared prediction error satisfies the Thm 2.6 bound with threshold determined by σ², d, k, δ, then dividing by n gives the MSE bound. This is the pure arithmetic step within the Corollary 2.8 derivation.