Theorem 3.3: Oracle Inequality for the LS Estimator (Misspecified Model) #
Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:
MSE(φ_{θ̂^LS}) ≤ inf_θ MSE(φ_θ) + C σ² M / n · log(1/δ)
Proof structure #
- Basic inequality: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄) for any θ̄.
- Pythagorean step: if Φθ̄ is the projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
- Sub-Gaussian concentration (equation 2.5): with probability ≥ 1 − δ, |Φθ̂ - Φθ̄|² ≲ σ²M log(1/δ)/n.
- Oracle inequality: conditioning on the concentration event and applying the algebraic pointwise bound yields the result.
Theorem 3.3 — Basic oracle inequality (algebraic core). If θ̂ minimizes |Y - Φθ|² and Y = f + ε, then for any θ̄: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).
Theorem 3.3 — Pythagorean step. If additionally Φθ̄ is the orthogonal projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
Helper lemmas for the oracle inequality #
Pointwise bound: for each θ, |f - Φθ̂|² ≤ |f - Φθ|² + |Φθ̂ - Φθ̄|² This is the KEY algebraic step combining projection optimality + Pythagoras
Sub-Gaussian concentration bound (axiomatized) #
The book's proof of Theorem 3.3 invokes "the same steps as the ones following equation (2.5)" from Chapter 2 to bound the estimation error |Φθ̂ − Φθ̄|² ≲ σ²M log(1/δ)/n with probability 1 − δ. Since this is a cross-chapter reference to a sub-Gaussian max inequality, we axiomatize the resulting concentration event as a hypothesis of the main theorem.
Theorem 3.3: Probabilistic oracle inequality #
The main theorem. Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the LS estimator satisfies with probability ≥ 1 − δ:
MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n
We formalize this as follows:
- (Ω, μ) is a probability space
- ε : Ω → Fin n → ℝ is the random noise
- θ̂(ω) is the LS estimator given noise ε(ω)
- The sub-Gaussian concentration bound is taken as hypothesis
- The conclusion is a measure-theoretic probability bound
Deterministic oracle inequality (pointwise version): For any θ, MSE(Φθ̂, f) ≤ MSE(Φθ, f) + (1/n) · R where R bounds |Φθ̂ − Φθ̄|².
Oracle inequality with infimum (deterministic version): MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + (1/n) · R
Theorem 3.3 (Rigollet): Probabilistic oracle inequality for the LS estimator.
Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:
MSE(φ_{θ̂^LS}) ≤ inf_{θ ∈ ℝ^M} MSE(φ_θ) + C σ² M log(1/δ) / n
The proof conditions on the sub-Gaussian concentration event (established via "the same steps as equation (2.5)" in the book — a cross-chapter reference), then applies the deterministic algebraic bound.
Parameters:
(Ω, μ): probability spaceΦ: design matrix (n × M)f: true regression functionε: random sub-Gaussian noise vectorθhat(ω): LS estimator (depends on noise realization)θbar: projection parameter (Φθbar is projection of f onto col(Φ))C: numerical constant from sub-Gaussian boundσ: sub-Gaussian parameterδ: confidence parameter
Theorem 3.3 (Rigollet, High-Dimensional Statistics, 2023).
Under the general regression model Y = f + ε with ε ~ subGₙ(σ²), the least squares estimator θ̂ satisfies:
- Basic inequality: For each realization, |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).
- Pythagorean step: |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
- Oracle inequality: With probability at least 1 − δ, MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n
where C > 0 is a universal constant. The bound uses M * log(1/δ) as in the book's statement (Theorem 3.3), which follows from rank(ΦᵀΦ) ≤ M and log(1/δ) ≥ 1 (ensured by δ ≤ e⁻¹).