Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Problem_2_1

Problem 2.1: Ridge Regression #

From Rigollet Chapter 2, Problem 2.1.

Consider the linear regression model with fixed design with d ≤ n. The ridge regression estimator for parameter τ > 0 is defined as:

θ̂ᵣᵢᵈᵍᵉ = argmin_θ { (1/n)|Y - Xθ|₂² + τ|θ|₂² }

(a) Show that for any τ > 0, θ̂ᵣᵢᵈᵍᵉ is uniquely defined and give its closed form: θ̂ = (XᵀX + nτI)⁻¹ XᵀY

(b) Compute the bias of θ̂ᵣᵢᵈᵍᵉ and show that it is bounded in absolute value by |θ*|₂.

noncomputable def Rigollet.l2normSq {m : } (v : Fin m) :

The squared ℓ₂ norm of a vector v : Fin m → ℝ, i.e., ‖v‖₂² = Σᵢ vᵢ².

Instances For
    noncomputable def Rigollet.ridgeEstimator {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (τ : ) :
    Fin d

    The ridge regression estimator: θ̂ = (XᵀX + nτI)⁻¹ XᵀY. This is the closed-form solution of the ridge objective argmin_θ { (1/n)|Y - Xθ|₂² + τ|θ|₂² }.

    Instances For
      theorem Rigollet.problem_2_1a_ridge_closed_form {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θ : Fin d) (τ : ) ( : 0 < τ) :
      1 / n * l2normSq (Y - X.mulVec (ridgeEstimator X Y τ)) + τ * l2normSq (ridgeEstimator X Y τ) 1 / n * l2normSq (Y - X.mulVec θ) + τ * l2normSq θ

      Problem 2.1(a): The ridge estimator is the unique minimizer of the ridge objective. For any τ > 0, the matrix XᵀX + nτI is positive definite (hence invertible), so θ̂ = (XᵀX + nτI)⁻¹ XᵀY is uniquely defined and minimizes (1/n)|Y - Xθ|₂² + τ|θ|₂² over all θ ∈ ℝᵈ.

      theorem Rigollet.problem_2_1b_ridge_bias_bound {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (τ : ) ( : 0 < τ) :
      have A := X.transpose * X + (n * τ) 1; have bias := A⁻¹.mulVec (X.transpose.mulVec (X.mulVec θstar)) - θstar; l2normSq bias l2normSq θstar

      Problem 2.1(b): The bias of the ridge estimator is bounded by |θ*|₂.

      In the fixed design model Y = Xθ* + ε with E[ε] = 0: E[θ̂] = (XᵀX + nτI)⁻¹ XᵀX θ* so bias = E[θ̂] - θ* = ((XᵀX + nτI)⁻¹ XᵀX - I) θ*

      The bias satisfies ‖E[θ̂] - θ*‖₂ ≤ ‖θ*‖₂. This follows because (XᵀX + nτI)⁻¹ XᵀX has all eigenvalues in [0,1], so I - (XᵀX + nτI)⁻¹ XᵀX has eigenvalues in [0,1] and is a contraction.