Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3_3

Theorem 3.3: Oracle Inequality for the LS Estimator (Misspecified Model) #

Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:

MSE(φ_{θ̂^LS}) ≤ inf_θ MSE(φ_θ) + C σ² M / n · log(1/δ)

Proof structure #

  1. Basic inequality: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄) for any θ̄.
  2. Pythagorean step: if Φθ̄ is the projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
  3. Sub-Gaussian concentration (equation 2.5): with probability ≥ 1 − δ, |Φθ̂ - Φθ̄|² ≲ σ²M log(1/δ)/n.
  4. Oracle inequality: conditioning on the concentration event and applying the algebraic pointwise bound yields the result.
theorem Rigollet.Chapter3.dotProduct_add_expand {n : } (a b : Fin n) :
(a + b) ⬝ᵥ (a + b) = a ⬝ᵥ a + 2 * a ⬝ᵥ b + b ⬝ᵥ b

Norm-squared expansion: |a + b|² = |a|² + 2⟨a,b⟩ + |b|².

theorem Rigollet.Chapter3.pythagoras_dotProduct {n : } (a b : Fin n) (h : a ⬝ᵥ b = 0) :
(a + b) ⬝ᵥ (a + b) = a ⬝ᵥ a + b ⬝ᵥ b

Pythagoras: if ⟨a, b⟩ = 0 then |a + b|² = |a|² + |b|².

theorem Rigollet.Chapter3.thm_3_3_basic_inequality {n M : } (Φ : Matrix (Fin n) (Fin M) ) (f ε : Fin n) (θhat θbar : Fin M) (hLS : ∀ (θ : Fin M), (f + ε - Φ.mulVec θhat) ⬝ᵥ (f + ε - Φ.mulVec θhat) (f + ε - Φ.mulVec θ) ⬝ᵥ (f + ε - Φ.mulVec θ)) :
(f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) (f - Φ.mulVec θbar) ⬝ᵥ (f - Φ.mulVec θbar) + 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

Theorem 3.3 — Basic oracle inequality (algebraic core). If θ̂ minimizes |Y - Φθ|² and Y = f + ε, then for any θ̄: |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).

theorem Rigollet.Chapter3.thm_3_3_pythagorean_step {n M : } (Φ : Matrix (Fin n) (Fin M) ) (f ε : Fin n) (θhat θbar : Fin M) (hLS : ∀ (θ : Fin M), (f + ε - Φ.mulVec θhat) ⬝ᵥ (f + ε - Φ.mulVec θhat) (f + ε - Φ.mulVec θ) ⬝ᵥ (f + ε - Φ.mulVec θ)) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) :
(Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) 2 * ε ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

Theorem 3.3 — Pythagorean step. If additionally Φθ̄ is the orthogonal projection of f onto col(Φ), then |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).

Helper lemmas for the oracle inequality #

noncomputable def Rigollet.Chapter3.MSE' {n : } (fhat f : Fin n) :

Local MSE definition: MSE(fhat, f) = (1/n) Σᵢ (fhat_i - f_i)²

Instances For
    theorem Rigollet.Chapter3.MSE'_eq_dotProduct {n : } (v f : Fin n) :
    MSE' v f = 1 / n * (v - f) ⬝ᵥ (v - f)

    MSE = (1/n) * dotProduct(v-f, v-f)

    theorem Rigollet.Chapter3.dotProduct_sub_comm' {n : } (v f : Fin n) :
    (v - f) ⬝ᵥ (v - f) = (f - v) ⬝ᵥ (f - v)

    Squared norm is symmetric: |v-f|² = |f-v|²

    Squared norm is nonneg

    theorem Rigollet.Chapter3.pointwise_bound {n M : } (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat θbar : Fin M) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (θ : Fin M) :
    (f - Φ.mulVec θhat) ⬝ᵥ (f - Φ.mulVec θhat) (f - Φ.mulVec θ) ⬝ᵥ (f - Φ.mulVec θ) + (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar)

    Pointwise bound: for each θ, |f - Φθ̂|² ≤ |f - Φθ|² + |Φθ̂ - Φθ̄|² This is the KEY algebraic step combining projection optimality + Pythagoras

    Sub-Gaussian concentration bound (axiomatized) #

    The book's proof of Theorem 3.3 invokes "the same steps as the ones following equation (2.5)" from Chapter 2 to bound the estimation error |Φθ̂ − Φθ̄|² ≲ σ²M log(1/δ)/n with probability 1 − δ. Since this is a cross-chapter reference to a sub-Gaussian max inequality, we axiomatize the resulting concentration event as a hypothesis of the main theorem.

    Theorem 3.3: Probabilistic oracle inequality #

    The main theorem. Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the LS estimator satisfies with probability ≥ 1 − δ:

    MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n

    We formalize this as follows:

    theorem Rigollet.Chapter3.thm_3_3_oracle_inequality_pointwise {n M : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat θbar : Fin M) (R : ) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hConc : (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) R) (θ : Fin M) :
    MSE' (Φ.mulVec θhat) f MSE' (Φ.mulVec θ) f + 1 / n * R

    Deterministic oracle inequality (pointwise version): For any θ, MSE(Φθ̂, f) ≤ MSE(Φθ, f) + (1/n) · R where R bounds |Φθ̂ − Φθ̄|².

    theorem Rigollet.Chapter3.thm_3_3_oracle_inequality_det {n M : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat θbar : Fin M) (R : ) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hConc : (Φ.mulVec θhat - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec θhat - Φ.mulVec θbar) R) :
    MSE' (Φ.mulVec θhat) f (⨅ (θ : Fin M), MSE' (Φ.mulVec θ) f) + 1 / n * R

    Oracle inequality with infimum (deterministic version): MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + (1/n) · R

    theorem Rigollet.Chapter3.thm_3_3_oracle_inequality {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (θbar : Fin M) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (_hLS : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ)) (C_const : ) (_hC : 0 < C_const) (σ : ) (_hσ : 0 < σ) (δ : ) (_hδ_pos : 0 < δ) (_hδ_le : δ 1) (hConcentration : μ {ω : Ω | (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) C_const * σ ^ 2 * M * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)) (_hMeasConc : MeasurableSet {ω : Ω | (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) C_const * σ ^ 2 * M * Real.log (1 / δ)}) :
    μ {ω : Ω | MSE' (Φ.mulVec (θhat ω)) f (⨅ (θ : Fin M), MSE' (Φ.mulVec θ) f) + C_const * σ ^ 2 * M * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

    Theorem 3.3 (Rigollet): Probabilistic oracle inequality for the LS estimator.

    Under the general regression model Y = f + ε with ε ~ subG_n(σ²), the least squares estimator θ̂^LS satisfies, with probability at least 1 − δ:

    MSE(φ_{θ̂^LS}) ≤ inf_{θ ∈ ℝ^M} MSE(φ_θ) + C σ² M log(1/δ) / n

    The proof conditions on the sub-Gaussian concentration event (established via "the same steps as equation (2.5)" in the book — a cross-chapter reference), then applies the deterministic algebraic bound.

    Parameters:

    • (Ω, μ): probability space
    • Φ: design matrix (n × M)
    • f: true regression function
    • ε: random sub-Gaussian noise vector
    • θhat(ω): LS estimator (depends on noise realization)
    • θbar: projection parameter (Φθbar is projection of f onto col(Φ))
    • C: numerical constant from sub-Gaussian bound
    • σ: sub-Gaussian parameter
    • δ: confidence parameter
    theorem Rigollet.Chapter3.thm_3_3 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (θbar : Fin M) (hProj : ∀ (v : Fin M), (f - Φ.mulVec θbar) ⬝ᵥ Φ.mulVec v = 0) (hLS : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ)) (σ : ) ( : 0 < σ) (hsubG : ∀ (v : Fin n), v ⬝ᵥ v 1∀ (s : ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) μ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hδ_small : δ Real.exp (-1)) :
    (∀ (ω : Ω) (θ' : Fin M), (f - Φ.mulVec (θhat ω)) ⬝ᵥ (f - Φ.mulVec (θhat ω)) (f - Φ.mulVec θ') ⬝ᵥ (f - Φ.mulVec θ') + 2 * ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θ')) (∀ (ω : Ω), (Φ.mulVec (θhat ω) - Φ.mulVec θbar) ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar) 2 * ε ω ⬝ᵥ (Φ.mulVec (θhat ω) - Φ.mulVec θbar)) ∃ (C : ), 0 < C ∀ (θ : Fin M), μ {ω : Ω | MSE' (Φ.mulVec (θhat ω)) f MSE' (Φ.mulVec θ) f + C * σ ^ 2 * M * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

    Theorem 3.3 (Rigollet, High-Dimensional Statistics, 2023).

    Under the general regression model Y = f + ε with ε ~ subGₙ(σ²), the least squares estimator θ̂ satisfies:

    1. Basic inequality: For each realization, |f - Φθ̂|² ≤ |f - Φθ̄|² + 2εᵀ(Φθ̂ - Φθ̄).
    2. Pythagorean step: |Φθ̂ - Φθ̄|² ≤ 2εᵀ(Φθ̂ - Φθ̄).
    3. Oracle inequality: With probability at least 1 − δ, MSE(Φθ̂, f) ≤ inf_θ MSE(Φθ, f) + C σ² M log(1/δ) / n

    where C > 0 is a universal constant. The bound uses M * log(1/δ) as in the book's statement (Theorem 3.3), which follows from rank(ΦᵀΦ) ≤ M and log(1/δ) ≥ 1 (ensured by δ ≤ e⁻¹).