Corollary 2.8: Sparse MSE bound for ℓ₀-constrained LS #

From Rigollet, Chapter 2, Corollary 2.8:

Under the assumptions of Theorem 2.6, for any δ > 0, with probability 1 - δ, it holds

$$\mathsf{MSE}(\mathbb{X}\hat{\theta}_{\mathcal{B}_0(k)}^{\mathrm{LS}}) \lesssim \frac{\sigma^2 k}{n} \log\left(\frac{ed}{2k}\right) + \frac{\sigma^2 k}{n} \log(6) + \frac{\sigma^2}{n} \log(1/\delta) \,.$$

Proof #

This follows immediately from the proof of Theorem 2.6 by applying Lemma 2.7 to bound log C(d,2k) ≤ 2k · log(ed/(2k)). Specifically, the tail bound (2.6)

P(|X(θ̂-θ*)|² > 4t) ≤ C(d,2k) · 6^{2k} · exp(-t/(8σ²))

requires t ≥ Cσ² {log C(d,2k) + k·log(6) + log(1/δ)} for the RHS ≤ δ. By Lemma 2.7, C(d,2k) ≤ (ed/(2k))^{2k}, so log C(d,2k) ≤ 2k·log(ed/(2k)). Plugging in t = 8σ²·L with L = 2k·log(ed/(2k)) + 2k·log(6) + log(1/δ), we get |X(θ̂-θ*)|² ≤ 4t = 32σ²·L, hence dividing by n gives the MSE bound.

source

theorem cor_2_8_mse_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (k : ℕ) (hk : 1 ≤ k) (hkd : 2 * k ≤ d) (σ : ℝ) (hσ : 0 < σ) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hhat_sparse : ∀ (ω : Ω), {j : Fin d | θhat ω j ≠ 0}.card ≤ k) (hstar_sparse : {j : Fin d | θstar j ≠ 0}.card ≤ k) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), {j : Fin d | θ j ≠ 0}.card ≤ k → (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ ∧ ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) :

μ {ω : Ω | 1 / ↑n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 32 * σ ^ 2 / ↑n * (2 * ↑k * Real.log (Real.exp 1 * ↑d / (2 * ↑k)) + 2 * ↑k * Real.log 6 + Real.log (1 / δ))} ≥ ENNReal.ofReal (1 - δ)

Corollary 2.8 (Rigollet): Sparse MSE bound for ℓ₀-constrained LS with Lemma 2.7 applied.

Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), with θ̂ the ℓ₀-constrained LS estimator over k-sparse vectors and θ* ∈ B₀(k), for any δ ∈ (0,1], with probability at least 1 - δ:

MSE(Xθ̂) = (1/n)|Xθ̂ - Xθ*|₂² ≤ (32σ²/n)(2k·log(ed/(2k)) + 2k·log 6 + log(1/δ))

This is Corollary 2.8 of Rigollet's textbook, which applies Lemma 2.7 (binomial coefficient bound C(d,2k) ≤ (ed/(2k))^{2k}) to simplify the tail bound from Theorem 2.6 equation (2.6). The key observation is that log C(d,2k) ≤ 2k·log(ed/(2k)), yielding the rate:

MSE ≲ σ²k/n · log(ed/(2k)) + σ²k/n · log(6) + σ²/n · log(1/δ)

which for fixed δ gives the minimax-optimal rate σ²k·log(d/k)/n.

source

theorem cor_2_8_sparse_rate (n d k : ℕ) (hn : 0 < ↑n) (hk : 1 ≤ k) (hkd : 2 * k ≤ d) (σsq δ : ℝ) (hσ : 0 < σsq) (hδ : 0 < δ) (sq_pred_err : ℝ) (hbound : sq_pred_err ≤ 4 * (8 * σsq * (2 * ↑k * Real.log (Real.exp 1 * ↑d / (2 * ↑k)) + 2 * ↑k * Real.log 6 + Real.log (1 / δ)))) :

sq_pred_err / ↑n ≤ 32 * σsq / ↑n * (2 * ↑k * Real.log (Real.exp 1 * ↑d / (2 * ↑k)) + 2 * ↑k * Real.log 6 + Real.log (1 / δ))

Corollary 2.8, deterministic form (algebraic step).

If the squared prediction error satisfies the Thm 2.6 bound with threshold determined by σ², d, k, δ, then dividing by n gives the MSE bound. This is the pure arithmetic step within the Corollary 2.8 derivation.

Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Cor_2_8

Corollary 2.8: Sparse MSE bound for ℓ₀-constrained LS #

Proof #