Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Prop_2_1

Proposition 2.1: Normal Equations for Least Squares #

Main results #

Part 1: Normal equations #

leastSquares_normalEquations: If θ minimizes ‖Y - Xθ‖² (the squared residual, expressed as dotProduct (Y - X *ᵥ θ) (Y - X *ᵥ θ)), then Xᵀ *ᵥ (X *ᵥ θ) = Xᵀ *ᵥ Y, i.e., the normal equations XᵀXθ = XᵀY hold.
leastSquares_normalEquations_residual: Equivalently, Xᵀ *ᵥ (Y - X *ᵥ θ) = 0, i.e., the residual Y - Xθ is orthogonal to the column space of X.
leastSquares_normalEquations_matrix: The matrix form (Xᵀ * X) *ᵥ θ = Xᵀ *ᵥ Y.

Part 2: Pseudoinverse formula #

leastSquares_pseudoinverse: θ = (X⊤X)†X⊤Y is a minimizer of ‖Y - Xθ‖², where (X⊤X)† is the Moore-Penrose pseudoinverse of X⊤X.

Combined #

leastSquares_normalEquations_and_pseudoinverse: Bundled conjunction of both parts.

Proof strategy #

Part 1: The proof uses the first-order optimality condition for unconstrained minimization. If θ minimizes f(θ') = ‖Y - Xθ'‖², then for any direction h and scalar t: f(θ + th) ≥ f(θ) Setting r = Y - Xθ, this gives: ‖r - t(Xh)‖² ≥ ‖r‖² Expanding: -2t⟨r, Xh⟩ + t²‖Xh‖² ≥ 0 for all t ∈ ℝ Since this quadratic in t is nonneg for all t, the linear coefficient must vanish: ⟨r, Xh⟩ = 0 for all h Using the adjoint relation ⟨r, Xh⟩ = ⟨Xᵀr, h⟩, we conclude Xᵀr = 0.

Part 2: The second statement follows from the definition of the Moore-Penrose pseudoinverse: θ = (X⊤X)†X⊤Y satisfies the normal equations because (X⊤X)(X⊤X)†(X⊤Y) = X⊤Y (using the first Moore-Penrose condition A·A†·A = A and the fact that X⊤Y ∈ range(X⊤X)).

The existence of the Moore-Penrose pseudoinverse is proved (not axiomatized) via the spectral theorem: diagonalize the Gram matrix AᴴA, invert nonzero eigenvalues, and verify the four Moore-Penrose conditions.

References #

Rigollet, High-Dimensional Statistics, Proposition 2.1

theorem LeastSquares.transpose_eq_star_real {n : Type u_1} [Fintype n] [DecidableEq n] (X : Matrix n n ℝ) :

X.transpose = star X

For real matrices, transpose equals star (conjugate transpose).

theorem LeastSquares.dotProduct_sub_smul_expand {n : ℕ} (a b : Fin n → ℝ) (t : ℝ) :

(a - t • b) ⬝ᵥ (a - t • b) = a ⬝ᵥ a - 2 * t * a ⬝ᵥ b + t ^ 2 * b ⬝ᵥ b

Dot product expansion: ‖a - tb‖² = ‖a‖² - 2t⟨a,b⟩ + t²‖b‖² in terms of dotProduct.

theorem LeastSquares.eq_zero_of_forall_quad_nonneg (a b : ℝ) (hb : 0 ≤ b) (h : ∀ (t : ℝ), -2 * t * a + t ^ 2 * b ≥ 0) :

a = 0

If -2ta + t²b ≥ 0 for all t ∈ ℝ with b ≥ 0, then a = 0. This is a standard result about nonnegative quadratic forms: the discriminant condition forces the linear coefficient to vanish.

theorem LeastSquares.dotProduct_self_nonneg {n : ℕ} (v : Fin n → ℝ) :

0 ≤ v ⬝ᵥ v

Dot product of a vector with itself is nonneg (for ℝ-valued vectors).

theorem LeastSquares.leastSquares_normalEquations_residual {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (θ : Fin d → ℝ) (hmin : ∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :

X.transpose.mulVec (Y - X.mulVec θ) = 0

Proposition 2.1 (Normal Equations — Residual Form). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then the residual Y - Xθ is orthogonal to the column space of X: Xᵀ(Y - Xθ) = 0.

theorem LeastSquares.leastSquares_normalEquations {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (θ : Fin d → ℝ) (hmin : ∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :

X.transpose.mulVec (X.mulVec θ) = X.transpose.mulVec Y

Proposition 2.1 (Normal Equations). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then Xᵀ(Xθ) = XᵀY, i.e., θ satisfies the normal equations.

theorem LeastSquares.leastSquares_normalEquations_matrix {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (θ : Fin d → ℝ) (hmin : ∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :

(X.transpose * X).mulVec θ = X.transpose.mulVec Y

Proposition 2.1 (Normal Equations — Matrix Form). The normal equations in matrix form: (XᵀX)θ = XᵀY.

Part 2: Moore-Penrose Pseudoinverse Formula #

structure LeastSquares.IsMoorePenrosePseudoinverse {m : Type u_1} {n : Type u_2} [Fintype m] [Fintype n] [DecidableEq m] [DecidableEq n] (A : Matrix m n ℝ) (B : Matrix n m ℝ) :

A matrix B is a Moore-Penrose pseudoinverse of A if it satisfies the four Moore-Penrose conditions. For real matrices, the conjugate transpose equals the transpose.

mul_pinv_mul : A * B * A = A
Condition 1: ABA = A
pinv_mul_pinv : B * A * B = B
Condition 2: BAB = B
mul_pinv_symmetric : (A * B).transpose = A * B
Condition 3: (AB)ᵀ = AB (AB is symmetric)
pinv_mul_symmetric : (B * A).transpose = B * A
Condition 4: (BA)ᵀ = BA (BA is symmetric)

Instances For

theorem LeastSquares.exists_moorePenrosePseudoinverse {m : Type u_1} {n : Type u_2} [Fintype m] [Fintype n] [DecidableEq m] [DecidableEq n] (A : Matrix m n ℝ) :

∃ (B : Matrix n m ℝ), IsMoorePenrosePseudoinverse A B

The Moore-Penrose pseudoinverse exists for any real matrix.

Construction: For A : Matrix m n ℝ, form the Gram matrix H = AᴴA (Hermitian, n×n). Use the spectral theorem to diagonalize H = φ(D) where φ is conjugation by the eigenvector unitary and D = diagonal(eigenvalues). Define H† = φ(D⁻¹) where D⁻¹ = diagonal(eigenvalues⁻¹) (with 0⁻¹ = 0). Then B = H† * Aᴴ is the Moore-Penrose pseudoinverse of A.

noncomputable def LeastSquares.moorePenrosePseudoinverse {d : ℕ} (A : Matrix (Fin d) (Fin d) ℝ) :

Matrix (Fin d) (Fin d) ℝ

The Moore-Penrose pseudoinverse of a square real matrix, denoted A†.

Instances For

theorem LeastSquares.moorePenrosePseudoinverse_spec {d : ℕ} (A : Matrix (Fin d) (Fin d) ℝ) :

IsMoorePenrosePseudoinverse A A†

The Moore-Penrose pseudoinverse satisfies the four Moore-Penrose conditions.

def LeastSquares.«term_†» :

Lean.TrailingParserDescr

Notation: A† for the Moore-Penrose pseudoinverse.

Instances For

theorem LeastSquares.range_transpose_subset_range_gramian {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) :

∃ (w : Fin d → ℝ), (X.transpose * X).mulVec w = X.transpose.mulVec Y

For any real matrix X, the vector X⊤Y lies in the range of X⊤X. This is because range(X⊤) = range(X⊤X) for real matrices, which follows from the fact that ker(X) = ker(X⊤X) (since ‖Xv‖² = v⊤X⊤Xv, so Xv = 0 iff X⊤Xv = 0).

theorem LeastSquares.normalEquations_imply_minimizer {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) (θ : Fin d → ℝ) (hresid : X.transpose.mulVec (Y - X.mulVec θ) = 0) (θ' : Fin d → ℝ) :

(Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

Converse of the normal equations (sufficient condition for optimality). If the residual Y - Xθ is orthogonal to the column space of X (i.e., Xᵀ(Y - Xθ) = 0), then θ minimizes the squared residual ‖Y - Xθ'‖² over all θ'. This is the converse of leastSquares_normalEquations_residual.

theorem LeastSquares.pseudoinverse_satisfies_normal_eq {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) :

(X.transpose * X).mulVec ((X.transpose * X)†.mulVec (X.transpose.mulVec Y)) = X.transpose.mulVec Y

θ = (X⊤X)†X⊤Y satisfies the normal equations (X⊤X)θ = X⊤Y. This uses the first Moore-Penrose condition (A·A†·A = A) together with the fact that X⊤Y lies in the range of X⊤X.

theorem LeastSquares.leastSquares_pseudoinverse {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) :

have θ_LS := (X.transpose * X)†.mulVec (X.transpose.mulVec Y); ∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ_LS) ⬝ᵥ (Y - X.mulVec θ_LS) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

Proposition 2.1, Part 2 (Pseudoinverse Formula for Least Squares). Moreover, θ̂^LS can be chosen to be θ̂^LS = (X⊤X)†X⊤Y, where (X⊤X)† denotes the Moore-Penrose pseudoinverse of X⊤X. Specifically, the vector θ = (X⊤X)†X⊤Y minimizes the squared residual ‖Y - Xθ‖² over all θ.

theorem LeastSquares.leastSquares_normalEquations_and_pseudoinverse {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Y : Fin n → ℝ) :

(∀ (θ : Fin d → ℝ), (∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) → X.transpose.mulVec (X.mulVec θ) = X.transpose.mulVec Y) ∧ have θ_LS := (X.transpose * X)†.mulVec (X.transpose.mulVec Y); ∀ (θ' : Fin d → ℝ), (Y - X.mulVec θ_LS) ⬝ᵥ (Y - X.mulVec θ_LS) ≤ (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

Proposition 2.1 (Combined): Normal Equations and Pseudoinverse Formula. Part 1: If θ minimizes ‖Y - Xθ‖², then Xᵀ(Xθ̂) = XᵀY (normal equations). Part 2: θ̂ = (X⊤X)†X⊤Y is a minimizer of ‖Y - Xθ‖² (pseudoinverse formula).