Proposition 2.1: Normal Equations for Least Squares #
Main results #
Part 1: Normal equations #
leastSquares_normalEquations: If θ minimizes ‖Y - Xθ‖² (the squared residual, expressed asdotProduct (Y - X *ᵥ θ) (Y - X *ᵥ θ)), thenXᵀ *ᵥ (X *ᵥ θ) = Xᵀ *ᵥ Y, i.e., the normal equationsXᵀXθ = XᵀYhold.leastSquares_normalEquations_residual: Equivalently,Xᵀ *ᵥ (Y - X *ᵥ θ) = 0, i.e., the residualY - Xθis orthogonal to the column space of X.leastSquares_normalEquations_matrix: The matrix form(Xᵀ * X) *ᵥ θ = Xᵀ *ᵥ Y.
Part 2: Pseudoinverse formula #
leastSquares_pseudoinverse: θ = (X⊤X)†X⊤Y is a minimizer of ‖Y - Xθ‖², where (X⊤X)† is the Moore-Penrose pseudoinverse of X⊤X.
Combined #
leastSquares_normalEquations_and_pseudoinverse: Bundled conjunction of both parts.
Proof strategy #
Part 1: The proof uses the first-order optimality condition for unconstrained minimization. If θ minimizes f(θ') = ‖Y - Xθ'‖², then for any direction h and scalar t: f(θ + th) ≥ f(θ) Setting r = Y - Xθ, this gives: ‖r - t(Xh)‖² ≥ ‖r‖² Expanding: -2t⟨r, Xh⟩ + t²‖Xh‖² ≥ 0 for all t ∈ ℝ Since this quadratic in t is nonneg for all t, the linear coefficient must vanish: ⟨r, Xh⟩ = 0 for all h Using the adjoint relation ⟨r, Xh⟩ = ⟨Xᵀr, h⟩, we conclude Xᵀr = 0.
Part 2: The second statement follows from the definition of the Moore-Penrose pseudoinverse: θ = (X⊤X)†X⊤Y satisfies the normal equations because (X⊤X)(X⊤X)†(X⊤Y) = X⊤Y (using the first Moore-Penrose condition A·A†·A = A and the fact that X⊤Y ∈ range(X⊤X)).
The existence of the Moore-Penrose pseudoinverse is proved (not axiomatized) via the spectral theorem: diagonalize the Gram matrix AᴴA, invert nonzero eigenvalues, and verify the four Moore-Penrose conditions.
References #
- Rigollet, High-Dimensional Statistics, Proposition 2.1
For real matrices, transpose equals star (conjugate transpose).
If -2ta + t²b ≥ 0 for all t ∈ ℝ with b ≥ 0, then a = 0. This is a standard result about nonnegative quadratic forms: the discriminant condition forces the linear coefficient to vanish.
Proposition 2.1 (Normal Equations — Residual Form). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then the residual Y - Xθ is orthogonal to the column space of X: Xᵀ(Y - Xθ) = 0.
Proposition 2.1 (Normal Equations). If θ minimizes the squared residual ‖Y - Xθ‖² = (Y - Xθ) ⬝ᵥ (Y - Xθ), then Xᵀ(Xθ) = XᵀY, i.e., θ satisfies the normal equations.
Proposition 2.1 (Normal Equations — Matrix Form). The normal equations in matrix form: (XᵀX)θ = XᵀY.
Part 2: Moore-Penrose Pseudoinverse Formula #
A matrix B is a Moore-Penrose pseudoinverse of A if it satisfies the four Moore-Penrose conditions. For real matrices, the conjugate transpose equals the transpose.
Condition 1: ABA = A
Condition 2: BAB = B
Condition 3: (AB)ᵀ = AB (AB is symmetric)
Condition 4: (BA)ᵀ = BA (BA is symmetric)
Instances For
The Moore-Penrose pseudoinverse exists for any real matrix.
Construction: For A : Matrix m n ℝ, form the Gram matrix H = AᴴA (Hermitian, n×n). Use the spectral theorem to diagonalize H = φ(D) where φ is conjugation by the eigenvector unitary and D = diagonal(eigenvalues). Define H† = φ(D⁻¹) where D⁻¹ = diagonal(eigenvalues⁻¹) (with 0⁻¹ = 0). Then B = H† * Aᴴ is the Moore-Penrose pseudoinverse of A.
Notation: A† for the Moore-Penrose pseudoinverse.
Instances For
For any real matrix X, the vector X⊤Y lies in the range of X⊤X. This is because range(X⊤) = range(X⊤X) for real matrices, which follows from the fact that ker(X) = ker(X⊤X) (since ‖Xv‖² = v⊤X⊤Xv, so Xv = 0 iff X⊤Xv = 0).
Converse of the normal equations (sufficient condition for optimality).
If the residual Y - Xθ is orthogonal to the column space of X (i.e., Xᵀ(Y - Xθ) = 0),
then θ minimizes the squared residual ‖Y - Xθ'‖² over all θ'.
This is the converse of leastSquares_normalEquations_residual.
Proposition 2.1, Part 2 (Pseudoinverse Formula for Least Squares). Moreover, θ̂^LS can be chosen to be θ̂^LS = (X⊤X)†X⊤Y, where (X⊤X)† denotes the Moore-Penrose pseudoinverse of X⊤X. Specifically, the vector θ = (X⊤X)†X⊤Y minimizes the squared residual ‖Y - Xθ‖² over all θ.
Proposition 2.1 (Combined): Normal Equations and Pseudoinverse Formula. Part 1: If θ minimizes ‖Y - Xθ‖², then Xᵀ(Xθ̂) = XᵀY (normal equations). Part 2: θ̂ = (X⊤X)†X⊤Y is a minimizer of ‖Y - Xθ‖² (pseudoinverse formula).