Theorem 3.3: Oracle Inequality for the LS Estimator (Misspecified Model) #

Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:

MSE(φ_{θ̂^LS}) ≤ inf_θ MSE(φ_θ) + C σ² M / n · log(1/δ)

Proof structure #

Basic inequality: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄) for any θ̄.
Pythagorean step: if Φθ̄ is the projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
Sub-Gaussian concentration (equation 2.5): with probability ≥ 1 − δ, |Φθ̂ - Φθ̄|² ≲ σ²M log(1/δ)/n.
Oracle inequality: conditioning on the concentration event and applying the algebraic pointwise bound yields the result.

source

theorem Rigollet.Chapter3.dotProduct_add_expand {n : ℕ} (a b : Fin n → ℝ) :

(a + b) ⬝ᵥ (a + b) = a ⬝ᵥ a + 2 * a ⬝ᵥ b + b ⬝ᵥ b

Norm-squared expansion: |a + b|² = |a|² + 2⟨a,b⟩ + |b|².

source

theorem Rigollet.Chapter3.pythagoras_dotProduct {n : ℕ} (a b : Fin n → ℝ) (h : a ⬝ᵥ b = 0) :

(a + b) ⬝ᵥ (a + b) = a ⬝ᵥ a + b ⬝ᵥ b

Pythagoras: if ⟨a, b⟩ = 0 then |a + b|² = |a|² + |b|².

source

theorem Rigollet.Chapter3.thm_3_3_basic_inequality {n M : ℕ} (Φ : Matrix (Fin n) (Fin M) ℝ) (f ε : Fin n → ℝ) (θhat θbar : Fin M → ℝ) (hLS : ∀ (θ : Fin M → ℝ), (f + ε - Φ.mulVec θhat) ⬝ᵥ (f + ε - Φ.mulVec θhat) ≤ (f + ε - Φ.mulVec θ) ⬝ᵥ (f + ε - Φ.mulVec θ)) :

(f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) ≤ (f - Φ.mulVec θbar) ⬝ᵥ (f - Φ.mulVec θbar) + 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

Theorem 3.3 — Basic oracle inequality (algebraic core). If θ̂ minimizes |Y - Φθ|² and Y = f + ε, then for any θ̄: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).

source

theorem Rigollet.Chapter3.thm_3_3_pythagorean_step {n M : ℕ} (Φ : Matrix (Fin n) (Fin M) ℝ) (f ε : Fin n → ℝ) (θhat θbar : Fin M → ℝ) (hLS : ∀ (θ : Fin M → ℝ), (f + ε - Φ.mulVec θhat) ⬝ᵥ (f + ε - Φ.mulVec θhat) ≤ (f + ε - Φ.mulVec θ) ⬝ᵥ (f + ε - Φ.mulVec θ)) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) :

(Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) ≤ 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

Theorem 3.3 — Pythagorean step. If additionally Φθ̄ is the orthogonal projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).

Helper lemmas for the oracle inequality #

source

noncomputable def Rigollet.Chapter3.MSE' {n : ℕ} (fhat f : Fin n → ℝ) :

ℝ

Local MSE definition: MSE(fhat, f) = (1/n) Σᵢ (fhat_i - f_i)²

Instances For

source

theorem Rigollet.Chapter3.MSE'_eq_dotProduct {n : ℕ} (v f : Fin n → ℝ) :

MSE' v f = 1 / ↑n * (v - f) ⬝ᵥ (v - f)

MSE = (1/n) * dotProduct(v-f, v-f)

source

theorem Rigollet.Chapter3.dotProduct_sub_comm' {n : ℕ} (v f : Fin n → ℝ) :

(v - f) ⬝ᵥ (v - f) = (f - v) ⬝ᵥ (f - v)

Squared norm is symmetric: |v-f|² = |f-v|²

source

theorem Rigollet.Chapter3.dotProduct_self_nonneg' {n : ℕ} (v : Fin n → ℝ) :

0 ≤ v ⬝ᵥ v

Squared norm is nonneg

source

theorem Rigollet.Chapter3.pointwise_bound {n M : ℕ} (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (θhat θbar : Fin M → ℝ) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (θ : Fin M → ℝ) :

(f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) ≤ (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

Pointwise bound: for each θ, |f - Φθ̂|² ≤ |f - Φθ|² + |Φθ̂ - Φθ̄|² This is the KEY algebraic step combining projection optimality + Pythagoras

Sub-Gaussian concentration bound (axiomatized) #

The book's proof of Theorem 3.3 invokes "the same steps as the ones following equation (2.5)" from Chapter 2 to bound the estimation error |Φθ̂ − Φθ̄|² ≲ σ²M log(1/δ)/n with probability 1 − δ. Since this is a cross-chapter reference to a sub-Gaussian max inequality, we axiomatize the resulting concentration event as a hypothesis of the main theorem.

Theorem 3.3: Probabilistic oracle inequality #

The main theorem. Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the LS estimator satisfies with probability ≥ 1 − δ:

MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n

We formalize this as follows:

(Ω, μ) is a probability space
ε : Ω → Fin n → ℝ is the random noise
θ̂(ω) is the LS estimator given noise ε(ω)
The sub-Gaussian concentration bound is taken as hypothesis
The conclusion is a measure-theoretic probability bound

source

theorem Rigollet.Chapter3.thm_3_3_oracle_inequality_pointwise {n M : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (θhat θbar : Fin M → ℝ) (R : ℝ) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hConc : (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) ≤ R) (θ : Fin M → ℝ) :

MSE' (Φ.mulVec θhat) f ≤ MSE' (Φ.mulVec θ) f + 1 / ↑n * R

Deterministic oracle inequality (pointwise version): For any θ, MSE(Φθ̂, f) ≤ MSE(Φθ, f) + (1/n) · R where R bounds |Φθ̂ − Φθ̄|².

source

theorem Rigollet.Chapter3.thm_3_3_oracle_inequality_det {n M : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (θhat θbar : Fin M → ℝ) (R : ℝ) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hConc : (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) ≤ R) :

MSE' (Φ.mulVec θhat) f ≤ (⨅ (θ : Fin M → ℝ), MSE' (Φ.mulVec θ) f) + 1 / ↑n * R

Oracle inequality with infimum (deterministic version): MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + (1/n) · R

source

theorem Rigollet.Chapter3.thm_3_3_oracle_inequality {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin M → ℝ) (θbar : Fin M → ℝ) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (_hLS : ∀ (ω : Ω) (θ : Fin M → ℝ), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) ≤ (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ)) (C_const : ℝ) (_hC : 0 < C_const) (σ : ℝ) (_hσ : 0 < σ) (δ : ℝ) (_hδ_pos : 0 < δ) (_hδ_le : δ ≤ 1) (hConcentration : μ {ω : Ω | (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ≤ C_const * σ ^ 2 * ↑M * Real.log (1 / δ)} ≥ ENNReal.ofReal (1 - δ)) (_hMeasConc : MeasurableSet {ω : Ω | (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ≤ C_const * σ ^ 2 * ↑M * Real.log (1 / δ)}) :

μ {ω : Ω | MSE' (Φ.mulVec (θhat ω)) f ≤ (⨅ (θ : Fin M → ℝ), MSE' (Φ.mulVec θ) f) + C_const * σ ^ 2 * ↑M * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Theorem 3.3 (Rigollet): Probabilistic oracle inequality for the LS estimator.

Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:

MSE(φ_{θ̂^LS}) ≤ inf_{θ ∈ ℝ^M} MSE(φ_θ) + C σ² M log(1/δ) / n

The proof conditions on the sub-Gaussian concentration event (established via "the same steps as equation (2.5)" in the book — a cross-chapter reference), then applies the deterministic algebraic bound.

Parameters:

(Ω, μ): probability space
Φ: design matrix (n × M)
f: true regression function
ε: random sub-Gaussian noise vector
θhat(ω): LS estimator (depends on noise realization)
θbar: projection parameter (Φθbar is projection of f onto col(Φ))
C: numerical constant from sub-Gaussian bound
σ: sub-Gaussian parameter
δ: confidence parameter

source

theorem Rigollet.Chapter3.thm_3_3 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : ℕ} (hn : 0 < ↑n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin M → ℝ) (θbar : Fin M → ℝ) (hProj : ∀ (v : Fin M → ℝ), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hLS : ∀ (ω : Ω) (θ : Fin M → ℝ), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) ≤ (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ)) (σ : ℝ) (hσ : 0 < σ) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hδ_small : δ ≤ Real.exp (-1)) :

(∀ (ω : Ω) (θ' : Fin M → ℝ), (f - Φ.mulVec (θhat ω)) ⬝ᵥ (f - Φ.mulVec (θhat ω)) ≤ (f - Φ.mulVec θ') ⬝ᵥ (f - Φ.mulVec θ') + 2 * ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ')) ∧ (∀ (ω : Ω), (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ≤ 2 * ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar)) ∧ ∃ (C : ℝ), 0 < C ∧ ∀ (θ : Fin M → ℝ), μ {ω : Ω | MSE' (Φ.mulVec (θhat ω)) f ≤ MSE' (Φ.mulVec θ) f + C * σ ^ 2 * ↑M * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Theorem 3.3 (Rigollet, High-Dimensional Statistics, 2023).

Under the general regression model Y = f + ε with ε ~ subGₙ(σ²), the least squares estimator θ̂ satisfies:

Basic inequality: For each realization, |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).
Pythagorean step: |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
Oracle inequality: With probability at least 1 − δ, MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n

where C > 0 is a universal constant. The bound uses M * log(1/δ) as in the book's statement (Theorem 3.3), which follows from rank(ΦᵀΦ) ≤ M and log(1/δ) ≥ 1 (ensured by δ ≤ e⁻¹).

Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3_3

Theorem 3.3: Oracle Inequality for the LS Estimator (Misspecified Model) #

Proof structure #

Helper lemmas for the oracle inequality #

Sub-Gaussian concentration bound (axiomatized) #

Theorem 3.3: Probabilistic oracle inequality #