Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Prop_2_1

Proposition 2.1: Normal Equations for Least Squares #

Main results #

Part 1: Normal equations #

Part 2: Pseudoinverse formula #

Combined #

Proof strategy #

Part 1: The proof uses the first-order optimality condition for unconstrained minimization. If θ minimizes f(θ') = ‖Y - Xθ'‖², then for any direction h and scalar t: f(θ + th) ≥ f(θ) Setting r = Y - Xθ, this gives: ‖r - t(Xh)‖² ≥ ‖r‖² Expanding: -2t⟨r, Xh⟩ + t²‖Xh‖² ≥ 0 for all t ∈ ℝ Since this quadratic in t is nonneg for all t, the linear coefficient must vanish: ⟨r, Xh⟩ = 0 for all h Using the adjoint relation ⟨r, Xh⟩ = ⟨Xᵀr, h⟩, we conclude Xᵀr = 0.

Part 2: The second statement follows from the definition of the Moore-Penrose pseudoinverse: θ = (X⊤X)†X⊤Y satisfies the normal equations because (X⊤X)(X⊤X)†(X⊤Y) = X⊤Y (using the first Moore-Penrose condition A·A†·A = A and the fact that X⊤Y ∈ range(X⊤X)).

The existence of the Moore-Penrose pseudoinverse is proved (not axiomatized) via the spectral theorem: diagonalize the Gram matrix AᴴA, invert nonzero eigenvalues, and verify the four Moore-Penrose conditions.

References #

For real matrices, transpose equals star (conjugate transpose).

theorem LeastSquares.dotProduct_sub_smul_expand {n : } (a b : Fin n) (t : ) :
(a - t b) ⬝ᵥ (a - t b) = a ⬝ᵥ a - 2 * t * a ⬝ᵥ b + t ^ 2 * b ⬝ᵥ b

Dot product expansion: ‖a - tb‖² = ‖a‖² - 2t⟨a,b⟩ + t²‖b‖² in terms of dotProduct.

theorem LeastSquares.eq_zero_of_forall_quad_nonneg (a b : ) (hb : 0 b) (h : ∀ (t : ), -2 * t * a + t ^ 2 * b 0) :
a = 0

If -2ta + t²b ≥ 0 for all t ∈ ℝ with b ≥ 0, then a = 0. This is a standard result about nonnegative quadratic forms: the discriminant condition forces the linear coefficient to vanish.

theorem LeastSquares.dotProduct_self_nonneg {n : } (v : Fin n) :
0 v ⬝ᵥ v

Dot product of a vector with itself is nonneg (for ℝ-valued vectors).

theorem LeastSquares.leastSquares_normalEquations_residual {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θ : Fin d) (hmin : ∀ (θ' : Fin d), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :
X.transpose.mulVec (Y - X.mulVec θ) = 0

Proposition 2.1 (Normal Equations — Residual Form). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then the residual Y - Xθ is orthogonal to the column space of X: Xᵀ(Y - Xθ) = 0.

theorem LeastSquares.leastSquares_normalEquations {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θ : Fin d) (hmin : ∀ (θ' : Fin d), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :

Proposition 2.1 (Normal Equations). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then Xᵀ(Xθ) = XᵀY, i.e., θ satisfies the normal equations.

theorem LeastSquares.leastSquares_normalEquations_matrix {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θ : Fin d) (hmin : ∀ (θ' : Fin d), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')) :

Proposition 2.1 (Normal Equations — Matrix Form). The normal equations in matrix form: (XᵀX)θ = XᵀY.

Part 2: Moore-Penrose Pseudoinverse Formula #

structure LeastSquares.IsMoorePenrosePseudoinverse {m : Type u_1} {n : Type u_2} [Fintype m] [Fintype n] [DecidableEq m] [DecidableEq n] (A : Matrix m n ) (B : Matrix n m ) :

A matrix B is a Moore-Penrose pseudoinverse of A if it satisfies the four Moore-Penrose conditions. For real matrices, the conjugate transpose equals the transpose.

  • mul_pinv_mul : A * B * A = A

    Condition 1: ABA = A

  • pinv_mul_pinv : B * A * B = B

    Condition 2: BAB = B

  • mul_pinv_symmetric : (A * B).transpose = A * B

    Condition 3: (AB)ᵀ = AB (AB is symmetric)

  • pinv_mul_symmetric : (B * A).transpose = B * A

    Condition 4: (BA)ᵀ = BA (BA is symmetric)

Instances For

    The Moore-Penrose pseudoinverse exists for any real matrix.

    Construction: For A : Matrix m n ℝ, form the Gram matrix H = AᴴA (Hermitian, n×n). Use the spectral theorem to diagonalize H = φ(D) where φ is conjugation by the eigenvector unitary and D = diagonal(eigenvalues). Define H† = φ(D⁻¹) where D⁻¹ = diagonal(eigenvalues⁻¹) (with 0⁻¹ = 0). Then B = H† * Aᴴ is the Moore-Penrose pseudoinverse of A.

    noncomputable def LeastSquares.moorePenrosePseudoinverse {d : } (A : Matrix (Fin d) (Fin d) ) :
    Matrix (Fin d) (Fin d)

    The Moore-Penrose pseudoinverse of a square real matrix, denoted A†.

    Instances For

      The Moore-Penrose pseudoinverse satisfies the four Moore-Penrose conditions.

      Notation: A† for the Moore-Penrose pseudoinverse.

      Instances For
        theorem LeastSquares.range_transpose_subset_range_gramian {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) :
        ∃ (w : Fin d), (X.transpose * X).mulVec w = X.transpose.mulVec Y

        For any real matrix X, the vector X⊤Y lies in the range of X⊤X. This is because range(X⊤) = range(X⊤X) for real matrices, which follows from the fact that ker(X) = ker(X⊤X) (since ‖Xv‖² = v⊤X⊤Xv, so Xv = 0 iff X⊤Xv = 0).

        theorem LeastSquares.normalEquations_imply_minimizer {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θ : Fin d) (hresid : X.transpose.mulVec (Y - X.mulVec θ) = 0) (θ' : Fin d) :
        (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

        Converse of the normal equations (sufficient condition for optimality). If the residual Y - Xθ is orthogonal to the column space of X (i.e., Xᵀ(Y - Xθ) = 0), then θ minimizes the squared residual ‖Y - Xθ'‖² over all θ'. This is the converse of leastSquares_normalEquations_residual.

        θ = (X⊤X)†X⊤Y satisfies the normal equations (X⊤X)θ = X⊤Y. This uses the first Moore-Penrose condition (A·A†·A = A) together with the fact that X⊤Y lies in the range of X⊤X.

        theorem LeastSquares.leastSquares_pseudoinverse {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) :
        have θ_LS := (X.transpose * X).mulVec (X.transpose.mulVec Y); ∀ (θ' : Fin d), (Y - X.mulVec θ_LS) ⬝ᵥ (Y - X.mulVec θ_LS) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

        Proposition 2.1, Part 2 (Pseudoinverse Formula for Least Squares). Moreover, θ̂^LS can be chosen to be θ̂^LS = (X⊤X)†X⊤Y, where (X⊤X)† denotes the Moore-Penrose pseudoinverse of X⊤X. Specifically, the vector θ = (X⊤X)†X⊤Y minimizes the squared residual ‖Y - Xθ‖² over all θ.

        theorem LeastSquares.leastSquares_normalEquations_and_pseudoinverse {n d : } (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) :
        (∀ (θ : Fin d), (∀ (θ' : Fin d), (Y - X.mulVec θ) ⬝ᵥ (Y - X.mulVec θ) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ'))X.transpose.mulVec (X.mulVec θ) = X.transpose.mulVec Y) have θ_LS := (X.transpose * X).mulVec (X.transpose.mulVec Y); ∀ (θ' : Fin d), (Y - X.mulVec θ_LS) ⬝ᵥ (Y - X.mulVec θ_LS) (Y - X.mulVec θ') ⬝ᵥ (Y - X.mulVec θ')

        Proposition 2.1 (Combined): Normal Equations and Pseudoinverse Formula. Part 1: If θ minimizes ‖Y - Xθ‖², then Xᵀ(Xθ̂) = XᵀY (normal equations). Part 2: θ̂ = (X⊤X)†X⊤Y is a minimizer of ‖Y - Xθ‖² (pseudoinverse formula).