Theorem 2.2: Ridge/Least Squares Risk Bound #

Main results #

thm_2_2_ridge_risk_bound: The bundled theorem combining both parts: (1) E[MSE] ≤ 16σ²r/n, and (2) With probability ≥ 1−δ, MSE ≤ 64σ²(r+log(1/δ))/n.
thm_2_2_fundamental_inequality: The algebraic core (equation 2.5): |X(θ̂ − θ*)|² ≤ 2 · εᵀ X(θ̂ − θ*).
thm_2_2_ls_expected_mse: Part 1 — Expected MSE bound.
thm_2_2_ls_high_prob_mse: Part 2 — High-probability MSE bound.

Proof strategy #

The proof is purely algebraic at its core and uses two ingredients:

Minimality: Since θ̂ minimizes |Y − Xθ|², we have |Y − Xθ̂|² ≤ |Y − Xθ*|² = |ε|².
Expansion: Setting v = X(θ̂ − θ*), we have |Y − Xθ̂|² = |ε − v|² = |ε|² − 2⟨ε, v⟩ + |v|².

Combining: |ε|² − 2⟨ε, v⟩ + |v|² ≤ |ε|², so |v|² ≤ 2⟨ε, v⟩.

The full Theorem 2.2 combines this with sub-Gaussian tail bounds (Theorem 1.19) applied to ε̃ = Φᵀε (where Φ is an orthonormal basis for col(X)) to obtain: E[MSE] ≲ σ²r/n where r = rank(XᵀX).

References #

Rigollet, High-Dimensional Statistics, Theorem 2.2, equation (2.5)

source

theorem Rigollet.Chapter2.dotProduct_sub_expand {n : ℕ} (a b : Fin n → ℝ) :

(a - b) ⬝ᵥ (a - b) = a ⬝ᵥ a - 2 * a ⬝ᵥ b + b ⬝ᵥ b

The squared-norm expansion: |a - b|² = |a|² - 2⟨a,b⟩ + |b|² in terms of dotProduct.

source

theorem Rigollet.Chapter2.dotProduct_eq_euclidean_norm_sq {n : ℕ} (v : Fin n → ℝ) :

v ⬝ᵥ v = ‖WithLp.toLp 2 v‖ ^ 2

For real vectors, v ⬝ᵥ v equals the squared EuclideanSpace (L₂) norm of v.

source

theorem Rigollet.Chapter2.thm_2_2_fundamental_inequality {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (θhat : Fin d → ℝ) (hLS : ∀ (θ : Fin d → ℝ), (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) ≤ (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) :

X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) ≤ 2 * ε ⬝ᵥ X.mulVec (θhat - θstar)

Theorem 2.2 — Fundamental Inequality (algebraic core).

Under the linear model Y = Xθ* + ε, if θ̂ minimizes the squared residual |Y - Xθ|² = (Y - Xθ) ⬝ᵥ (Y - Xθ) over all θ, then

|X(θ̂ - θ*)|² ≤ 2 · εᵀ X(θ̂ - θ*)

where |v|² = v ⬝ᵥ v is the squared Euclidean (L₂) norm.

This is equation (2.5) in Rigollet, Chapter 2.

source

theorem Rigollet.Chapter2.thm_2_2_fundamental_inequality_norm {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Fin n → ℝ) (θhat : Fin d → ℝ) (hLS : ∀ (θ : Fin d → ℝ), (X.mulVec θstar + ε - X.mulVec θhat) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θhat) ≤ (X.mulVec θstar + ε - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε - X.mulVec θ)) :

‖WithLp.toLp 2 (X.mulVec (θhat - θstar))‖ ^ 2 ≤ 2 * ε ⬝ᵥ X.mulVec (θhat - θstar)

Theorem 2.2 — Fundamental Inequality (EuclideanSpace norm version).

The same result as thm_2_2_fundamental_inequality, but with the conclusion expressed using the L₂ norm: ‖X(θ̂ - θ*)‖²_L2 ≤ 2 · εᵀ X(θ̂ - θ*).

source

theorem Rigollet.Chapter2.subG_variance_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (Z : Ω → ℝ) (σ : ℝ) (hσ : 0 < σ) (hZ_int : MeasureTheory.Integrable Z μ) (hexp_int : ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * Z ω)) μ) (hsubG : ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * Z ω) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) :

∫ (ω : Ω), Z ω ^ 2 ∂μ ≤ 4 * σ ^ 2

Lemma 1.4 consequence: Sub-Gaussian variance bound.

For Z ~ subG(σ²) (i.e., E[exp(sZ)] ≤ exp(s²σ²/2) for all s), the second moment satisfies E[Z²] ≤ 4σ².

This is the k=2 case of Lemma 1.4 (Rigollet, Chapter 1): E[|Z|^k] ≤ (2σ²)^{k/2} · k · Γ(k/2) which gives E[Z²] ≤ (2σ²)^1 · 2 · Γ(1) = 2σ² · 2 · 1 = 4σ².

The full proof via the layer cake formula is in Rigollet.Chapter1.Lemma_1_4.

Note on the formalization gap: The textbook implicitly assumes exp(sZ) is integrable (as in Definition 1.2 / IsSubGaussian). The hypothesis below uses the Bochner integral, which returns 0 for non-integrable functions, making the MGF bound vacuously true when exp(sZ) is not integrable. In the downstream usage (subG_expected_squared_norm_bound), the hypothesis always comes from a full sub-Gaussian condition where integrability holds.

source

theorem Rigollet.Chapter2.inner_toLp_toLp_eq_dotProduct {n : ℕ} (u v : Fin n → ℝ) :

inner ℝ (WithLp.toLp 2 u) (WithLp.toLp 2 v) = u ⬝ᵥ v

source

theorem Rigollet.Chapter2.exists_onb_with_parseval {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (r : ℕ) (hr : r = (X.transpose * X).rank) :

∃ (Φ : Fin r → Fin n → ℝ), (∀ (i : Fin r), Φ i ⬝ᵥ Φ i = 1) ∧ (∀ (i j : Fin r), i ≠ j → Φ i ⬝ᵥ Φ j = 0) ∧ (∀ (δ : Fin d → ℝ), X.mulVec δ ⬝ᵥ X.mulVec δ = ∑ i : Fin r, (X.mulVec δ ⬝ᵥ Φ i) ^ 2) ∧ ∀ (w : Fin n → ℝ) (δ : Fin d → ℝ), w ⬝ᵥ X.mulVec δ = ∑ i : Fin r, X.mulVec δ ⬝ᵥ Φ i * w ⬝ᵥ Φ i

ONB existence with Parseval identity.

For any n×d real matrix X with r = rank(XᵀX), there exist r unit vectors φ₁,...,φᵣ ∈ ℝⁿ forming an orthonormal basis for the column space of X, satisfying the Parseval identity and the linear expansion property.

The construction uses stdOrthonormalBasis from Mathlib applied to the column space of X transported to EuclideanSpace ℝ (Fin n).

source

theorem Rigollet.Chapter2.integrable_of_mgf_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (Z : Ω → ℝ) (σ : ℝ) (_hmgf : ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * Z ω) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) (hint : ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * Z ω)) μ) :

MeasureTheory.Integrable Z μ ∧ (∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * Z ω)) μ) ∧ MeasureTheory.Integrable (fun (ω : Ω) => Z ω ^ 2) μ

Integrability from sub-Gaussian MGF bound.

If a random variable Z satisfies the sub-Gaussian MGF bound E[exp(s·Z)] ≤ exp(s²σ²/2) for all s, and exp(s·Z) is integrable for all s, then Z is integrable and Z² is integrable.

The integrability of exp(s·Z) is a standard part of the sub-Gaussian definition (see IsSubGaussian in Rigollet.Chapter1.Def_1_2). The textbook implicitly assumes this in the proof of Theorem 2.2 when it asserts that ε̃ = Φᵀε has integrable components.

source

theorem Rigollet.Chapter2.exists_onb_pointwise_bound_with_integrability {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (ε : Ω → Fin n → ℝ) (r : ℕ) (hr : r = (X.transpose * X).rank) (σ : ℝ) (_hσ : 0 < σ) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) (hsubG_int : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ) :

∃ (Φ : Fin r → Fin n → ℝ), (∀ (i : Fin r), Φ i ⬝ᵥ Φ i = 1) ∧ (∀ (δ : Fin d → ℝ) (v : Fin n → ℝ), X.mulVec δ ⬝ᵥ X.mulVec δ ≤ 2 * v ⬝ᵥ X.mulVec δ → X.mulVec δ ⬝ᵥ X.mulVec δ ≤ 4 * ∑ i : Fin r, (v ⬝ᵥ Φ i) ^ 2) ∧ (∀ (i : Fin r), MeasureTheory.Integrable (fun (ω : Ω) => ε ω ⬝ᵥ Φ i) μ) ∧ (∀ (i : Fin r) (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ Φ i)) μ) ∧ ∀ (i : Fin r), MeasureTheory.Integrable (fun (ω : Ω) => (ε ω ⬝ᵥ Φ i) ^ 2) μ

ONB decomposition with integrability.

For any n×d real matrix X with r = rank(XᵀX), there exist r unit vectors φ₁,...,φᵣ ∈ ℝⁿ forming an orthonormal basis for the column space of X, such that:

Pointwise Cauchy-Schwarz bound: For any δ ∈ ℝᵈ and noise v ∈ ℝⁿ, if |Xδ|² ≤ 2⟨v, Xδ⟩ (the fundamental inequality), then |Xδ|² ≤ 4 · Σᵢ (v ⬝ᵥ φᵢ)².
Proof (from Rigollet, Theorem 2.2): Write Xδ = Σ cᵢφᵢ (since Xδ ∈ col(X)). Then ⟨v,Xδ⟩ = Σ cᵢ⟨v,φᵢ⟩. By Cauchy-Schwarz, (Σ cᵢ⟨v,φᵢ⟩)² ≤ (Σ cᵢ²)(Σ⟨v,φᵢ⟩²) = |Xδ|²·(Σ⟨v,φᵢ⟩²). Combined with |Xδ|² ≤ 2⟨v,Xδ⟩, this gives |Xδ|² ≤ 4Σ⟨v,φᵢ⟩² by nlinarith.
Integrability: Each ε ⬝ᵥ φᵢ is integrable, has integrable exponential moments, and (ε ⬝ᵥ φᵢ)² is integrable.
Since dotProduct(φᵢ, φᵢ) = 1, the sub-Gaussian condition directly gives the MGF bound for ε ⬝ᵥ φᵢ, and integrable_of_mgf_bound yields all integrability conditions.

source

theorem Rigollet.Chapter2.sup_out_onb_expectation_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (hσ : 0 < σ) (r : ℕ) (hr : r = (X.transpose * X).rank) (hfund : ∀ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 2 * ε ω ⬝ᵥ X.mulVec (θhat ω - θstar)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) (hsubG_int : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ) (hvar : ∀ (Z : Ω → ℝ), MeasureTheory.Integrable Z μ → (∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * Z ω)) μ) → (∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * Z ω) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) → ∫ (ω : Ω), Z ω ^ 2 ∂μ ≤ 4 * σ ^ 2) :

∫ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ∂μ ≤ 16 * σ ^ 2 * ↑r

Sup-out and ONB decomposition bound.

Combining the fundamental inequality with the "sup-out" technique and the orthonormal basis decomposition of col(X), the expected squared prediction error is bounded by the sum of projected variances:

E[|X(θ̂-θ*)|²] ≤ 4 · Σᵢ₌₁ʳ E[ε̃ᵢ²] ≤ 16σ²r

where ε̃ = Φᵀε and Φ is an ONB for col(X).

Proof (Rigollet, Theorem 2.2):

Fundamental inequality: ∀ω, |X(θ̂-θ*)|² ≤ 2⟨ε, X(θ̂-θ*)⟩
For v = X(θ̂-θ*) in col(X), write v = Φν, so ⟨ε,v⟩ = ε̃ᵀ(ν/|ν|)·|ν| ≤ |v|·sup_{‖u‖≤1} ε̃ᵀu
Therefore |v|² ≤ 2|v|·sup ⟹ |v| ≤ 2·sup ⟹ |v|² ≤ 4·(sup)²
sup_{‖u‖≤1} (ε̃ᵀu)² = |ε̃|² = Σᵢ ε̃ᵢ² (optimum at u = ε̃/|ε̃|)
E[Σᵢ ε̃ᵢ²] = Σᵢ E[ε̃ᵢ²] ≤ Σᵢ 4σ² = 4σ²r (by subG_variance_bound)
Combined: E[|v|²] ≤ 4·4σ²r = 16σ²r

The ONB construction and integrability conditions are provided by exists_onb_pointwise_bound_with_integrability. The rest of the proof (integration, sum-integral interchange, and application of hvar) is formally verified below.

source

theorem Rigollet.Chapter2.subG_zero_dot_product_ae_zero {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (ε : Ω → Fin n → ℝ) (v : Fin n → ℝ) (_hv : v ⬝ᵥ v ≤ 1) (hsubG : ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * 0 ^ 2 / 2)) (hsubG_int : ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ) :

∀ᵐ (ω : Ω) ∂μ, ε ω ⬝ᵥ v = 0

Sub-Gaussian MGF bound with σ=0 implies projection is zero a.e.

From ∫ exp(s · ε⬝ᵥv) ≤ 1 for all s ∈ ℝ (with v a unit vector), the random variable Z = ε⬝ᵥv satisfies Z = 0 a.s.

Proof (Markov's inequality): For t > 0 and s > 0, P(Z ≥ t) ≤ E[exp(sZ)] · exp(-st) ≤ exp(-st) → 0 as s → +∞. Similarly P(Z ≤ -t) → 0 as s → -∞. Hence P(Z ≠ 0) = 0.

This argument requires measurability of Z = ε⬝ᵥv, which is implicit in the probability theory setting of the textbook. The formalization uses the Bochner integral without explicit measurability hypotheses, creating a gap that is bridged by sorry below.

source

theorem Rigollet.Chapter2.subG_zero_sigma_integral_le_zero {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (hfund : ∀ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 2 * ε ω ⬝ᵥ X.mulVec (θhat ω - θstar)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * 0 ^ 2 / 2)) (hsubG_int : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ) :

∫ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ∂μ ≤ 0

Zero sub-Gaussian parameter implies zero prediction error.

When σ = 0, the sub-Gaussian condition E[exp(s·εᵀv)] ≤ 1 for all s and unit vectors v implies that εᵀv = 0 a.s. for every direction v. Combined with the fundamental inequality |X(θ̂-θ*)|² ≤ 2⟨ε, X(θ̂-θ*)⟩, this forces |X(θ̂-θ*)|² = 0 a.s., and therefore ∫ |X(θ̂-θ*)|² dμ ≤ 0.

The proof that E[exp(sZ)] ≤ 1 for all s implies Z = 0 a.s. uses Markov's inequality: P(Z ≥ t) ≤ E[exp(sZ)]·exp(-st) ≤ exp(-st) → 0 as s → ∞. This step is captured by the axiom subG_zero_dot_product_ae_zero.

source

theorem Rigollet.Chapter2.subG_expected_squared_norm_bound {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} :

MeasureTheory.IsProbabilityMeasure μ → ∀ {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (r : ℕ) (hr : r = (X.transpose * X).rank) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) (hsubG_int : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ), ∫ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ∂μ ≤ 16 * σ ^ 2 * ↑r

Expected squared norm bound for sub-Gaussian noise.

For sub-Gaussian noise ε ~ subG_n(σ²), the expected squared prediction error satisfies E[|X(θ̂ - θ*)|²] ≤ 16 σ² r where r = rank(XᵀX).

The proof decomposes into:

thm_2_2_fundamental_inequality: |X(θ̂-θ*)|² ≤ 2⟨ε, X(θ̂-θ*)⟩ (proved)
sup_out_onb_expectation_bound: sup-out + ONB decomposition + Lemma 1.4
subG_variance_bound: E[Z²] ≤ 4σ² for Z ~ subG(σ²) (axiom from Ch1)

source

theorem Rigollet.Chapter2.thm_2_2_ls_expected_mse {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hprob : MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (r : ℕ) (hr : r = (X.transpose * X).rank) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) ∂μ ≤ Real.exp (s ^ 2 * σ ^ 2 / 2)) (hsubG_int : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ) :

∫ (ω : Ω), 1 / ↑n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ∂μ ≤ 16 * σ ^ 2 * ↑r / ↑n

Theorem 2.2, Part 1 — Expected MSE bound for Least Squares.

Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), the least squares estimator θ̂ satisfies

E[MSE(Xθ̂)] = (1/n) E[|Xθ̂ - Xθ*|²] ≤ 16 σ² r / n

where r = rank(XᵀX). The proof uses the fundamental inequality (equation 2.5) combined with the "sup-out" technique and Lemma 1.4 to bound 4 E[sup_{u ∈ B₂} (ε̃ᵀu)²] = 4 Σᵢ E[ε̃ᵢ²] ≤ 16 σ² r.

Rigollet, Chapter 2, Theorem 2.2, first display equation.

source

theorem Rigollet.Chapter2.bochner_mgf_of_lintegral_mgf {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} (X : Ω → ℝ) (hX : Measurable X) (σsq : ℝ) (hbound : ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * X ω)) ∂μ ≤ ENNReal.ofReal (Real.exp (σsq * s ^ 2 / 2))) (s : ℝ) :

∫ (ω : Ω), Real.exp (s * X ω) ∂μ ≤ Real.exp (σsq * s ^ 2 / 2)

Derive Bochner MGF bound from Lebesgue integral MGF bound. For nonneg measurable functions, the Lebesgue integral bound ∫⁻ ofReal(exp(s*X)) ≤ ofReal(exp(σ²s²/2)) implies the Bochner integral bound ∫ exp(s*X) ≤ exp(σ²s²/2), via integral_eq_lintegral_of_nonneg_ae.

source

theorem Rigollet.Chapter2.exp_integrable_of_subG_mgf_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Ω → ℝ) (hX : Measurable X) (σsq : ℝ) (hbound : ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * X ω)) ∂μ ≤ ENNReal.ofReal (Real.exp (σsq * s ^ 2 / 2))) (s : ℝ) :

MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * X ω)) μ

Bridge theorem: Lebesgue MGF bound + measurability implies exp-integrability. On a probability space, if X is measurable and the Lebesgue integral satisfies ∫⁻ ofReal(exp(sX)) ≤ ofReal(exp(σ²s²/2)) for all s, then exp(s*X) is Bochner integrable for every s.

The key insight is that the Lebesgue integral ∫⁻ gives ⊤ (not 0) for non-integrable nonneg functions, so the finite upper bound forces finiteness. This is in contrast to the Bochner integral ∫ which gives 0 for non-integrable functions, making the bound vacuous. The textbook's E[exp(sX)] for the nonneg function exp(sX) corresponds to ∫⁻, not ∫.

source

theorem Rigollet.Chapter2.eq_zero_of_forall_mul_le_exp_sub_one (c σsq : ℝ) (h : ∀ (s : ℝ), s * c ≤ Real.exp (σsq * s ^ 2 / 2) - 1) :

c = 0

If s * c ≤ exp(σsq * s² / 2) - 1 for all s : ℝ, then c = 0.

This is a pure-analysis helper for mean_zero_of_subG_mgf_bound. The proof uses the fact that g(s) = exp(σsq * s² / 2) - 1 - s * c has a global minimum at s = 0 (where g(0) = 0), so its derivative there must vanish, giving -c = 0.

source

theorem Rigollet.Chapter2.mean_zero_of_subG_mgf_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Ω → ℝ) (hXm : Measurable X) (σsq : ℝ) (hbound_lint : ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * X ω)) ∂μ ≤ ENNReal.ofReal (Real.exp (σsq * s ^ 2 / 2))) :

∫ (ω : Ω), X ω ∂μ = 0

Bridge theorem: MGF bound implies mean zero. Standard measure-theoretic fact: on a probability space, if E[exp(sX)] ≤ exp(σ²s²/2) for all s, then E[X] = 0.

Proof: from exp(sX) ≥ 1 + sX (convexity) and the MGF bound, we get s * E[X] ≤ exp(σ²s²/2) - 1 for all s. The function g(s) = exp(σ²s²/2) - 1 - s * E[X] is non-negative with g(0) = 0, so g'(0) = -E[X] = 0 by IsLocalMin.hasDerivAt_eq_zero.

source

theorem Rigollet.Chapter2.mgf_implies_isSubGaussian_onb_proj {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n r : ℕ} (Φ : Fin r → Fin n → ℝ) (hΦ_unit : ∀ (i : Fin r), Φ i ⬝ᵥ Φ i = 1) (hΦ_ortho : ∀ (i j : Fin r), i ≠ j → Φ i ⬝ᵥ Φ j = 0) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) (a : EuclideanSpace ℝ (Fin r)) (ha : ‖a‖ ≤ 1) :

IsSubGaussian (fun (ω : Ω) => inner ℝ a (WithLp.toLp 2 fun (i : Fin r) => ε ω ⬝ᵥ Φ i)) (σ ^ 2) μ

MGF bound + orthonormality implies IsSubGaussian for projected noise.

The textbook delegates the sub-Gaussianity of projected noise to "the last step in the proof of Theorem 1.19" without reproducing the argument. The proof reduces the inner product ⟨a, Φᵀε⟩ to a dot product ε ⬝ᵥ w where w = Φᵀa has ‖w‖ ≤ 1 (by orthonormality), then applies the MGF hypothesis. The integrability/mean-zero/exp-integrability conditions are supplied by bridge axioms (standard measure theory not in the book).

source

theorem Rigollet.Chapter2.onb_projected_noise_tail_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n r : ℕ} (hr : 0 < r) (Φ : Fin r → Fin n → ℝ) (hΦ_unit : ∀ (i : Fin r), Φ i ⬝ᵥ Φ i = 1) (hΦ_ortho : ∀ (i j : Fin r), i ≠ j → Φ i ⬝ᵥ Φ j = 0) (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) (t : ℝ) (ht : 0 < t) :

μ {ω : Ω | ∑ i : Fin r, (ε ω ⬝ᵥ Φ i) ^ 2 > t ^ 2} ≤ ENNReal.ofReal (6 ^ r * Real.exp (-(t ^ 2 / (8 * σ ^ 2))))

ONB projected noise tail bound.

For an ONB Φ of the column space of X, the squared norm of the projected noise ε̃ = Φᵀε satisfies a covering-number tail bound. This follows from Theorem 1.19's ε-net argument applied to the r-dimensional sub-Gaussian vector ε̃ = (⟨ε,φ₁⟩,...,⟨ε,φᵣ⟩). The textbook delegates this to "the last step in the proof of Theorem 1.19" without reproducing the argument.

The bridge between dotProduct (Fin n → ℝ) and inner (EuclideanSpace ℝ (Fin r)) is a representational gap; mathematically this is a direct application of theorem_1_19_tail_bound to the projected noise vector.

source

theorem Rigollet.Chapter2.six_lt_exp_two :

6 < Real.exp 2

6 < exp(2), needed for the covering number arithmetic.

source

theorem Rigollet.Chapter2.six_mul_exp_neg_two_le_one :

6 * Real.exp (-2) ≤ 1

6·exp(-2) ≤ 1, the key for the covering number bound decay.

source

theorem Rigollet.Chapter2.exp_neg_two_log_inv (δ : ℝ) (hδ : 0 < δ) :

Real.exp (-2 * Real.log (1 / δ)) = δ ^ 2

exp(-2·log(1/δ)) = δ² for δ > 0.

source

theorem Rigollet.Chapter2.covering_number_bound_le_delta (r : ℕ) (δ : ℝ) (hδ : 0 < δ) (hδ1 : δ ≤ 1) :

6 ^ r * Real.exp (-(2 * ↑r + 2 * Real.log (1 / δ))) ≤ δ

The covering number bound 6^r · exp(-(2r + 2·log(1/δ))) ≤ δ.

source

theorem Rigollet.Chapter2.le_four_mul_of_le_two_sqrt_mul_sqrt (a b : ℝ) (ha : 0 ≤ a) (hb : 0 ≤ b) (h : a ≤ 2 * (√a * √b)) :

a ≤ 4 * b

If a ≤ 2·√a·√b with a,b ≥ 0, then a ≤ 4·b. Used in the fundamental inequality → chi-squared reduction.

source

theorem Rigollet.Chapter2.subG_chi_squared_tail_bound_from_thm_1_19 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (hσ : 0 < σ) (r : ℕ) (hr : r = (X.transpose * X).rank) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hfund : ∀ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 2 * ε ω ⬝ᵥ X.mulVec (θhat ω - θstar)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) :

μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) > 64 * σ ^ 2 * (↑r + Real.log (1 / δ))} ≤ ENNReal.ofReal δ

Chi-squared tail bound from ONB decomposition + Theorem 1.19.

This captures the core probabilistic step in the high-probability bound for Theorem 2.2 (Rigollet). The textbook says (line 1179):

"Moreover, with probability 1 - δ, it follows from the last step in the proof of Theorem 1.19 that sup_{u∈B₂}(ε̃ᵀu)² ≤ 8log(6)σ²r + 8σ²log(1/δ)."

The proof combines:

ONB construction: Let Φ be an ONB for col(X) (exists_onb_with_parseval).
Fundamental inequality + sup-out: |X(θ̂-θ*)|² ≤ 4·Σᵢ⟨ε,φᵢ⟩².
Covering number bound: P(Σᵢ⟨ε,φᵢ⟩² > t²) ≤ 6^r·exp(-t²/(8σ²)) (onb_projected_noise_tail_bound, from Thm 1.19).
Arithmetic: Choose t so that 6^r·exp(-t²/(8σ²)) ≤ δ.

The constant 64 arises from: 4 (fundamental ineq) × 8 (covering number) × 2 (absorbing the log(6) factor), matching the textbook's ≲ notation.

source

theorem Rigollet.Chapter2.subG_squared_norm_high_prob_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (hσ : 0 < σ) (r : ℕ) (hr : r = (X.transpose * X).rank) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hfund : ∀ (ω : Ω), X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 2 * ε ω ⬝ᵥ X.mulVec (θhat ω - θstar)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) :

μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 64 * σ ^ 2 * (↑r + Real.log (1 / δ))} ≥ ENNReal.ofReal (1 - δ)

High-probability squared norm bound for sub-Gaussian noise.

Combining the fundamental inequality with the "sup-out" technique, ONB decomposition, and sub-Gaussian concentration (Theorem 1.19): With probability ≥ 1 - δ, |X(θ̂-θ*)|² ≤ 64σ²(r + log(1/δ)).

Proof sketch (Rigollet, Theorem 2.2, Part 2):

|X(θ̂-θ*)|² ≤ 4 sup_{u∈B₂} (ε̃ᵀu)² (from fundamental ineq + sup-out)
With prob 1-δ: sup ≤ Cσ²(r + log(1/δ)) (from Thm 1.19 last step)
Combined: |X(θ̂-θ*)|² ≤ 64σ²(r + log(1/δ)) w.p. 1-δ

source

theorem Rigollet.Chapter2.thm_2_2_ls_high_prob_mse {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (hσ : 0 < σ) (r : ℕ) (hr : r = (X.transpose * X).rank) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) :

μ {ω : Ω | 1 / ↑n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 64 * σ ^ 2 * (↑r + Real.log (1 / δ)) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Theorem 2.2, Part 2 — High-probability MSE bound.

Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), for any δ > 0, with probability at least 1 - δ:

MSE(Xθ̂) = (1/n)|Xθ̂ - Xθ*|² ≤ 64 σ²(r + log(1/δ)) / n

where r = rank(XᵀX).

The proof uses the fundamental inequality (equation 2.5) combined with sub-Gaussian tail bounds (Theorem 1.19) applied to the rotated noise ε̃ = Φᵀε. The additional log(1/δ) term comes from the sub-Gaussian concentration: P(Z² ≥ t) ≤ 2exp(-t/(4σ²)) for Z ~ subG(σ²), applied with a union bound over the r coordinates of ε̃.

Rigollet, Chapter 2, Theorem 2.2, second display equation.

source

theorem Rigollet.Chapter2.thm_2_2_ridge_risk_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (ε : Ω → Fin n → ℝ) (θhat : Ω → Fin d → ℝ) (σ : ℝ) (hσ : 0 < σ) (r : ℕ) (hr : r = (X.transpose * X).rank) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hLS : ∀ (ω : Ω) (θ : Fin d → ℝ), (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec (θhat ω)) ≤ (X.mulVec θstar + ε ω - X.mulVec θ) ⬝ᵥ (X.mulVec θstar + ε ω - X.mulVec θ)) (hsubG : ∀ (v : Fin n → ℝ), v ⬝ᵥ v ≤ 1 → ∀ (s : ℝ), ∫⁻ (ω : Ω), ENNReal.ofReal (Real.exp (s * ε ω ⬝ᵥ v)) ∂μ ≤ ENNReal.ofReal (Real.exp (s ^ 2 * σ ^ 2 / 2))) (hε_meas : ∀ (j : Fin n), Measurable fun (ω : Ω) => ε ω j) :

∫ (ω : Ω), 1 / ↑n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ∂μ ≤ 16 * σ ^ 2 * ↑r / ↑n ∧ μ {ω : Ω | 1 / ↑n * X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) ≤ 64 * σ ^ 2 * (↑r + Real.log (1 / δ)) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Theorem 2.2 — Ridge/LS risk bound (bundled).

Under the linear model Y = Xθ* + ε where ε ~ subG_n(σ²), the least squares estimator θ̂ satisfies both:

Expected MSE: E[(1/n)|Xθ̂ - Xθ*|²] ≤ 16σ²r/n
High-probability MSE: With probability ≥ 1-δ, (1/n)|Xθ̂ - Xθ*|² ≤ 64σ²(r + log(1/δ))/n

where r = rank(XᵀX).

Rigollet, Chapter 2, Theorem 2.2.

Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_2

Theorem 2.2: Ridge/Least Squares Risk Bound #

Main results #

Proof strategy #

References #