Squared Frobenius norm #
Operator norm helper for rectangular matrices #
Nuclear norm (via duality) #
The nuclear norm (Schatten 1-norm, trace norm) of a matrix: ‖Θ‖_* = Σⱼ σⱼ(Θ) where σⱼ are the singular values of Θ.
We define it via the duality characterization: ‖A‖_* = sup { ⟨B, A⟩_F : ‖B‖_op ≤ 1 }
Singular value predicates #
The SVD of X ∈ ℝ^{n×d} has singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0 where r = rank(X). We define the largest singular value as the operator norm, and characterize the smallest positive singular value by its positivity and bound relative to the largest.
IsLargestSingularValue X s means s is the largest singular value of X,
i.e. s = σ₁(X) = ‖X‖_op (the operator norm).
Instances For
IsSmallestPosSingularValue X s means s is the smallest positive
singular value of X, i.e. s = σᵣ(X) where r = rank(X).
Characterized by: s is positive, bounded above by the operator norm,
and is a lower bound on ‖Xv‖/‖v‖ for vectors outside the kernel.
Instances For
The smallest positive singular value is at most the largest.
Nuclear Norm Penalization Objective and Estimator #
MSE Bound (Remark 4.5) #
Remark 4.5 (MSE bound for the nuclear norm penalization estimator, [KLT11]).
For an appropriate choice of the regularization parameter τ, any nuclear norm penalization estimator Θ̂ satisfies, with probability at least 0.99:
(1/n)‖XΘ̂ - XΘ*‖_F² ≤ C · (λ₁/λᵣ) · σ² · rank(Θ*) · max(d,T) / n
where:
- X has SVD with singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0,
- λ₁/λᵣ is the condition number of X,
- σ² is the noise variance,
- rank(Θ*) is the rank of the true parameter matrix,
- d ∨ T = max(d, T),
- C is a universal constant (quantified before ∀ Θhat).
Note that τ is existentially quantified: the book says "for some appropriate choice of τ." Unlike the rank penalization estimator, this bound involves the condition number λ₁/λᵣ of the design matrix X.
The proof is deferred to [KLT11] in the textbook, so this is axiomatized.