Frobenius norm squared #
Nuclear norm (sum of singular values) #
The nuclear norm (trace norm / Schatten-1 norm) of a matrix given its SVD,
defined as the sum of singular values: ‖A‖_* = Σⱼ σⱼ.
Instances For
Soft-thresholding operator #
Soft-thresholding: softThreshold x τ = max(x - τ, 0).
Applied to singular values, this gives the proximal operator for the nuclear norm.
Instances For
Nuclear norm penalized objective and minimizer #
IsNuclearNormMinimizer Y M S τ n states that the matrix M (with SVD S)
minimizes the nuclear norm penalized objective (1/n)‖Y − M‖_F² + τ‖M‖_*
over all matrices.
Instances For
Helper: |x| ≤ y → x² ≤ y² #
Part 1: Oracle comparison for nuclear norm penalization #
Oracle Frobenius error bound (restricted to support).
The sum of squared oracle errors over the support {j : σⱼ* ≠ 0}
is bounded by rank(Θ*) · τ²/4.
Problem 4.3, Part 1 (Nuclear norm penalization MSE bound). Book statement (Rigollet, Problem 4.3.1): "Consider the multivariate regression model (4.1) and define Θ̂ as any solution to min_Θ { (1/n) ‖Y − XΘ‖F² + τ ‖XΘ‖* }. Show that there exists a choice of τ such that (1/n) ‖XΘ̂ − XΘ*‖_F² ≲ σ² rank(Θ*) (d ∨ T) / n with probability .99."
Hint from book: Consider the oracle matrix M̄ = Σⱼ ((λ̂ⱼ + λⱼ*)/2) ûⱼv̂ⱼᵀ, where λⱼ* and λ̂ⱼ are the singular values of XΘ* and Y respectively.
Proof sketch:
- By Lemma 4.2, ‖F(ω)‖_op ≤ τ w.h.p. for τ ~ σ√((d∨T)/n).
- Weyl's perturbation bound gives |λ̂ⱼ − λⱼ*| ≤ τ.
- Compare the nuclear norm minimizer against the oracle M̄.
- The oracle's Frobenius error is bounded by rank(Θ*) · τ²/4.
- The nuclear norm penalty difference is controlled by Weyl's bound.
- Combining: (1/n)‖XΘ̂−XΘ*‖_F² ≤ C · rank(Θ*) · τ² ≲ C · σ² · rank(Θ*) · (d∨T)/n.
The core algebraic bounds (oracle_sv_error_sq, support_oracle_frobenius_bound)
are fully proved above. The full oracle comparison requires Lemma 4.2's operator
norm tail bound which depends on measure-theoretic tools beyond current Mathlib.
Exercise status (sorry): This is an exercise left to the reader (Problem 4.3.1).
Per formalization guidelines, exercises may retain sorry when the proof is too
hard. Completing this proof would require:
(a) An oracle comparison argument: comparing the nuclear norm minimizer against
the oracle matrix M̄ = Σⱼ ((λ̂ⱼ + λⱼ*)/2) ûⱼv̂ⱼᵀ.
(b) Wrapping the deterministic bound in Lemma 4.2's probabilistic tail bound
(‖E‖_op ≤ τ with probability ≥ 0.99), which requires sub-Gaussian
matrix concentration inequalities not yet in Mathlib.
(c) Weyl's perturbation theorem for singular values (|λ̂ⱼ − λⱼ*| ≤ ‖E‖_op),
also not formalized in Mathlib.
Part 2: Closed form — soft-thresholding of singular values #
Part 2 (scalar identity). The closed form for the nuclear-norm
penalized estimator's singular values is given by soft-thresholding.
When the threshold t = (σ̂ - σ*) / 2 (half the difference between observed
and true singular values), we get max(σ̂ - t, 0) = (σ̂ + σ*) / 2.
This is the key algebraic identity showing that the nuclear norm minimizer XΘ̂ = Σⱼ max(λ̂ⱼ - τ/2, 0) ûⱼv̂ⱼᵀ can be rewritten as XΘ̂ = Σⱼ ((λ̂ⱼ + λⱼ*)/2) ûⱼv̂ⱼᵀ, the average of the observed and true SVD reconstructions.
Part 2 (vector form). Component-wise application of the scalar identity to all singular values simultaneously.
Problem 4.3, Part 2 (Closed form for XΘ̂). Book statement (Rigollet, Problem 4.3.2): "Find a closed form for XΘ̂."
The answer is: XΘ̂ = Σⱼ max(λ̂ⱼ − τ/2, 0) ûⱼv̂ⱼᵀ, where Y = Σⱼ λ̂ⱼ ûⱼv̂ⱼᵀ is the SVD of Y. That is, the nuclear-norm penalized estimator applies soft-thresholding to the singular values of Y at level τ/2.
This follows from the KKT optimality conditions: the proximal operator of the nuclear norm (Schatten-1 norm) is precisely soft-thresholding of singular values. The proof requires convex subdifferential calculus not available in Mathlib.
Exercise status (sorry): This is an exercise left to the reader (Problem 4.3.2).
Per formalization guidelines, exercises may retain sorry when the proof is too
hard. Completing this proof would require:
(a) Convex subdifferential calculus for the nuclear norm ‖·‖* (Schatten-1 norm),
specifically characterizing ∂‖M‖* in terms of the SVD of M.
(b) KKT (Karush-Kuhn-Tucker) optimality conditions for the nuclear norm
penalized objective, yielding the proximal operator.
(c) The proximal operator identity prox_{τ‖·‖_*}(Y) = soft-threshold of
singular values of Y at level τ. None of these tools are in Mathlib.