Problem 2.1: Ridge Regression #

From Rigollet Chapter 2, Problem 2.1.

Consider the linear regression model with fixed design with d ≤ n. The ridge regression estimator for parameter τ > 0 is defined as:

θ̂ᵣᵢᵈᵍᵉ = argmin_θ { (1/n)|Y - Xθ|₂² + τ|θ|₂² }

(a) Show that for any τ > 0, θ̂ᵣᵢᵈᵍᵉ is uniquely defined and give its closed form: θ̂ = (XᵀX + nτI)⁻¹ XᵀY

(b) Compute the bias of θ̂ᵣᵢᵈᵍᵉ and show that it is bounded in absolute value by |θ*|₂.

source

noncomputable def Rigollet.l2normSq {m : ℕ} (v : Fin m → ℝ) :

ℝ

The squared ℓ₂ norm of a vector v : Fin m → ℝ, i.e., ‖v‖₂² = Σᵢ vᵢ².

Instances For

source

noncomputable def Rigollet.ridgeEstimator {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (τ : ℝ) :

Fin d → ℝ

The ridge regression estimator: θ̂ = (XᵀX + nτI)⁻¹ XᵀY. This is the closed-form solution of the ridge objective argmin_θ { (1/n)|Y - Xθ|₂² + τ|θ|₂² }.

Instances For

source

theorem Rigollet.problem_2_1a_ridge_closed_form {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (θ : Fin d → ℝ) (τ : ℝ) (hτ : 0 < τ) :

1 / ↑n * l2normSq (Y - X.mulVec (ridgeEstimator X Y τ)) + τ * l2normSq (ridgeEstimator X Y τ) ≤ 1 / ↑n * l2normSq (Y - X.mulVec θ) + τ * l2normSq θ

Problem 2.1(a): The ridge estimator is the unique minimizer of the ridge objective. For any τ > 0, the matrix XᵀX + nτI is positive definite (hence invertible), so θ̂ = (XᵀX + nτI)⁻¹ XᵀY is uniquely defined and minimizes (1/n)|Y - Xθ|₂² + τ|θ|₂² over all θ ∈ ℝᵈ.

source

theorem Rigollet.problem_2_1b_ridge_bias_bound {n d : ℕ} (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ℝ) (θstar : Fin d → ℝ) (τ : ℝ) (hτ : 0 < τ) :

have A := X.transpose * X + (↑n * τ) • 1; have bias := A⁻¹.mulVec (X.transpose.mulVec (X.mulVec θstar)) - θstar; l2normSq bias ≤ l2normSq θstar

Problem 2.1(b): The bias of the ridge estimator is bounded by |θ*|₂.

In the fixed design model Y = Xθ* + ε with E[ε] = 0: E[θ̂] = (XᵀX + nτI)⁻¹ XᵀX θ* so bias = E[θ̂] - θ* = ((XᵀX + nτI)⁻¹ XᵀX - I) θ*

The bias satisfies ‖E[θ̂] - θ*‖₂ ≤ ‖θ*‖₂. This follows because (XᵀX + nτI)⁻¹ XᵀX has all eigenvalues in [0,1], so I - (XᵀX + nτI)⁻¹ XᵀX has eigenvalues in [0,1] and is a contraction.

Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Problem_2_1

Problem 2.1: Ridge Regression #