Corollary 2.8: Sparse MSE bound for ℓ₀-constrained LS #
From Rigollet, Chapter 2, Corollary 2.8:
Under the assumptions of Theorem 2.6, for any δ > 0, with probability 1 - δ, it holds
$$\mathsf{MSE}(\mathbb{X}\hat{\theta}_{\mathcal{B}_0(k)}^{\mathrm{LS}}) \lesssim \frac{\sigma^2 k}{n} \log\left(\frac{ed}{2k}\right) + \frac{\sigma^2 k}{n} \log(6) + \frac{\sigma^2}{n} \log(1/\delta) \,.$$
Proof #
This follows immediately from the proof of Theorem 2.6 by applying Lemma 2.7 to bound log C(d,2k) ≤ 2k · log(ed/(2k)). Specifically, the tail bound (2.6)
P(|X(θ̂-θ*)|² > 4t) ≤ C(d,2k) · 6^{2k} · exp(-t/(8σ²))
requires t ≥ Cσ² {log C(d,2k) + k·log(6) + log(1/δ)} for the RHS ≤ δ. By Lemma 2.7, C(d,2k) ≤ (ed/(2k))^{2k}, so log C(d,2k) ≤ 2k·log(ed/(2k)). Plugging in t = 8σ²·L with L = 2k·log(ed/(2k)) + 2k·log(6) + log(1/δ), we get |X(θ̂-θ*)|² ≤ 4t = 32σ²·L, hence dividing by n gives the MSE bound.
Corollary 2.8 (Rigollet): Sparse MSE bound for ℓ₀-constrained LS with Lemma 2.7 applied.
Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), with θ̂ the ℓ₀-constrained LS estimator over k-sparse vectors and θ* ∈ B₀(k), for any δ ∈ (0,1], with probability at least 1 - δ:
MSE(Xθ̂) = (1/n)|Xθ̂ - Xθ*|₂² ≤ (32σ²/n)(2k·log(ed/(2k)) + 2k·log 6 + log(1/δ))
This is Corollary 2.8 of Rigollet's textbook, which applies Lemma 2.7 (binomial coefficient bound C(d,2k) ≤ (ed/(2k))^{2k}) to simplify the tail bound from Theorem 2.6 equation (2.6). The key observation is that log C(d,2k) ≤ 2k·log(ed/(2k)), yielding the rate:
MSE ≲ σ²k/n · log(ed/(2k)) + σ²k/n · log(6) + σ²/n · log(1/δ)
which for fixed δ gives the minimax-optimal rate σ²k·log(d/k)/n.
Corollary 2.8, deterministic form (algebraic step).
If the squared prediction error satisfies the Thm 2.6 bound with threshold determined by σ², d, k, δ, then dividing by n gives the MSE bound. This is the pure arithmetic step within the Corollary 2.8 derivation.