Atlas.HighDimensionalStatistics.code.Chapter4.Remark_4

Squared Frobenius norm #

source

noncomputable def Chapter4.frobeniusNormSq {m p : ℕ} (A : Matrix (Fin m) (Fin p) ℝ) :

ℝ

Squared Frobenius norm of a matrix: ‖A‖_F² = Σᵢ Σⱼ (Aᵢⱼ)²

Instances For

source

theorem Chapter4.frobeniusNormSq_nonneg {m p : ℕ} (A : Matrix (Fin m) (Fin p) ℝ) :

0 ≤ frobeniusNormSq A

source

theorem Chapter4.frobeniusNormSq_zero {m p : ℕ} :

frobeniusNormSq 0 = 0

Operator norm helper for rectangular matrices #

source

noncomputable def Chapter4.rectOpNorm {m p : ℕ} (A : Matrix (Fin m) (Fin p) ℝ) :

ℝ

Operator norm of a rectangular real matrix, via its continuous linear map.

Instances For

source

theorem Chapter4.rectOpNorm_zero (m p : ℕ) :

rectOpNorm 0 = 0

Nuclear norm (via duality) #

The nuclear norm (Schatten 1-norm, trace norm) of a matrix: ‖Θ‖_* = Σⱼ σⱼ(Θ) where σⱼ are the singular values of Θ.

We define it via the duality characterization: ‖A‖_* = sup { ⟨B, A⟩_F : ‖B‖_op ≤ 1 }

source

noncomputable def Chapter4.nuclearNorm {d T : ℕ} (A : Matrix (Fin d) (Fin T) ℝ) :

ℝ

Nuclear norm (trace norm, Schatten 1-norm) of a matrix, defined via duality: ‖A‖_* = sup { ⟨B, A⟩_F : ‖B‖_op ≤ 1 }

Instances For

source

theorem Chapter4.nuclearNorm_nonneg {d T : ℕ} (Θ : Matrix (Fin d) (Fin T) ℝ) :

0 ≤ nuclearNorm Θ

Singular value predicates #

The SVD of X ∈ ℝ^{n×d} has singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0 where r = rank(X). We define the largest singular value as the operator norm, and characterize the smallest positive singular value by its positivity and bound relative to the largest.

source

def Chapter4.IsLargestSingularValue {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (s : ℝ) :

Prop

IsLargestSingularValue X s means s is the largest singular value of X, i.e. s = σ₁(X) = ‖X‖_op (the operator norm).

Instances For

source

def Chapter4.IsSmallestPosSingularValue {n d : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (s : ℝ) :

Prop

IsSmallestPosSingularValue X s means s is the smallest positive singular value of X, i.e. s = σᵣ(X) where r = rank(X). Characterized by: s is positive, bounded above by the operator norm, and is a lower bound on ‖Xv‖/‖v‖ for vectors outside the kernel.

Instances For

source

theorem Chapter4.IsLargestSingularValue.pos {n d : ℕ} {X : Matrix (Fin n) (Fin d) ℝ} {s : ℝ} (h : IsLargestSingularValue X s) (hX : X ≠ 0) :

0 < s

The largest singular value is positive (for nonzero matrices).

source

theorem Chapter4.IsSmallestPosSingularValue.pos {n d : ℕ} {X : Matrix (Fin n) (Fin d) ℝ} {s : ℝ} (h : IsSmallestPosSingularValue X s) (_hX : X ≠ 0) :

0 < s

The smallest positive singular value is positive.

source

theorem Chapter4.singularValue_le {n d : ℕ} {X : Matrix (Fin n) (Fin d) ℝ} {s₁ sᵣ : ℝ} (h₁ : IsLargestSingularValue X s₁) (hᵣ : IsSmallestPosSingularValue X sᵣ) :

sᵣ ≤ s₁

The smallest positive singular value is at most the largest.

Nuclear Norm Penalization Objective and Estimator #

source

noncomputable def Chapter4.nuclearNormObjective {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (τ : ℝ) (Θ : Matrix (Fin d) (Fin T) ℝ) :

ℝ

The nuclear norm penalization objective: L(Θ) = (1/n)‖Y - XΘ‖F² + τ · ‖Θ‖*

Instances For

source

def Chapter4.IsNuclearNormEstimator {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (τ : ℝ) (Θhat : Matrix (Fin d) (Fin T) ℝ) :

Prop

Θ̂ is a nuclear norm penalization estimator if it minimizes the objective: Θ̂ ∈ argmin_Θ { (1/n)‖Y - XΘ‖F² + τ · ‖Θ‖* }

Instances For

source

theorem Chapter4.nuclearNormObjective_nonneg {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (τ : ℝ) (hτ : 0 ≤ τ) (Θ : Matrix (Fin d) (Fin T) ℝ) :

0 ≤ nuclearNormObjective Y X τ Θ

The nuclear norm objective is nonneg when τ ≥ 0.

MSE Bound (Remark 4.5) #

source

noncomputable def Chapter4.predictionMSE {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) :

ℝ

Prediction MSE: (1/n)‖XΘ̂ - XΘ*‖_F²

Instances For

source

theorem Chapter4.predictionMSE_nonneg {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) :

0 ≤ predictionMSE X Θhat Θstar

source

theorem Chapter4.predictionMSE_self {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θ : Matrix (Fin d) (Fin T) ℝ) :

predictionMSE X Θ Θ = 0

source

theorem Chapter4.remark_4_5_nuclear_norm_MSE_bound {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (Θstar : Matrix (Fin d) (Fin T) ℝ) (σ : ℝ) (hσ : 0 < σ) (hn : 0 < ↑n) (hX : X ≠ 0) (sig₁ sigᵣ : ℝ) (hsig₁ : IsLargestSingularValue X sig₁) (hsigᵣ : IsSmallestPosSingularValue X sigᵣ) :

∃ (τ : ℝ), 0 < τ ∧ ∃ (C : ℝ), 0 < C ∧ ∀ (Θhat : Matrix (Fin d) (Fin T) ℝ), IsNuclearNormEstimator Y X τ Θhat → predictionMSE X Θhat Θstar ≤ C * (sig₁ / sigᵣ) * σ ^ 2 * ↑Θstar.rank * max ↑d ↑T / ↑n

Remark 4.5 (MSE bound for the nuclear norm penalization estimator, [KLT11]).

For an appropriate choice of the regularization parameter τ, any nuclear norm penalization estimator Θ̂ satisfies, with probability at least 0.99:

(1/n)‖XΘ̂ - XΘ*‖_F² ≤ C · (λ₁/λᵣ) · σ² · rank(Θ*) · max(d,T) / n

where:

X has SVD with singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0,
λ₁/λᵣ is the condition number of X,
σ² is the noise variance,
rank(Θ*) is the rank of the true parameter matrix,
d ∨ T = max(d, T),
C is a universal constant (quantified before ∀ Θhat).

Note that τ is existentially quantified: the book says "for some appropriate choice of τ." Unlike the rank penalization estimator, this bound involves the condition number λ₁/λᵣ of the design matrix X.

The proof is deferred to [KLT11] in the textbook, so this is axiomatized.

Documentation