Atlas.HighDimensionalStatistics.code.Chapter4.Background_Holder

Schatten p-norm #

The Schatten p-norm of a matrix is the ℓ_p norm of its singular value vector. Special cases:

p = 1: nuclear norm ‖A‖_* = Σⱼ σⱼ
p = 2: Frobenius norm ‖A‖_F = (Σⱼ σⱼ²)^{1/2}
p = ∞: operator norm ‖A‖_op = max_j σⱼ = σ₁

source

noncomputable def schattenNorm {d T : ℕ} (S : SVD d T) (q : ℝ) :

ℝ

The Schatten q-norm of a matrix given its SVD: ‖A‖_q = (Σⱼ σⱼ^q)^{1/q} for q ∈ [1, ∞).

Instances For

source

def nuclearNormSVD {d T : ℕ} (S : SVD d T) :

ℝ

The nuclear norm (Schatten 1-norm): ‖A‖_* = Σⱼ σⱼ

Instances For

source

def frobeniusInnerProduct {d T : ℕ} (A B : Matrix (Fin d) (Fin T) ℝ) :

ℝ

The Frobenius inner product: ⟨A, B⟩ = Tr(AᵀB) = Σᵢⱼ Aᵢⱼ Bᵢⱼ

Instances For

Helper lemmas for the Frobenius inner product #

source

theorem matrix_sum_apply {d T : ℕ} {ι : Type u_1} [Fintype ι] (f : ι → Matrix (Fin d) (Fin T) ℝ) (i : Fin d) (j : Fin T) :

(∑ k : ι, f k) i j = ∑ k : ι, f k i j

Pointwise application of a sum of matrices.

source

theorem frobeniusInnerProduct_sum {d T : ℕ} {ι : Type u_1} [Fintype ι] (f : ι → Matrix (Fin d) (Fin T) ℝ) (B : Matrix (Fin d) (Fin T) ℝ) :

frobeniusInnerProduct (∑ k : ι, f k) B = ∑ k : ι, frobeniusInnerProduct (f k) B

The Frobenius inner product distributes over sums in the first argument.

source

theorem frobeniusInnerProduct_smul {d T : ℕ} (c : ℝ) (A B : Matrix (Fin d) (Fin T) ℝ) :

frobeniusInnerProduct (c • A) B = c * frobeniusInnerProduct A B

The Frobenius inner product scales in the first argument.

source

theorem frobeniusInnerProduct_vecMulVec {d T : ℕ} (u : Fin d → ℝ) (v : Fin T → ℝ) (B : Matrix (Fin d) (Fin T) ℝ) :

frobeniusInnerProduct (Matrix.vecMulVec u v) B = u ⬝ᵥ B.mulVec v

The Frobenius inner product of a rank-1 matrix with B equals dotProduct u (B.mulVec v).

Operator norm bound for bilinear forms #

The key ingredient: for unit vectors u, v, dotProduct u (B.mulVec v) ≤ ‖B‖_op by Cauchy-Schwarz and the operator norm bound.

source

theorem dotProduct_mulVec_le_opNorm {d T : ℕ} (u : Fin d → ℝ) (v : Fin T → ℝ) (B : Matrix (Fin d) (Fin T) ℝ) (hu : ‖WithLp.toLp 2 u‖ = 1) (hv : ‖WithLp.toLp 2 v‖ = 1) :

u ⬝ᵥ B.mulVec v ≤ matrixOpNorm B

For unit vectors u, v: dotProduct u (B.mulVec v) ≤ ‖B‖_op.

Hölder's inequality for Schatten norms #

Hölder-Schatten inequalities #

The SVD structure in our formalization does not encode orthonormality of the singular vectors. The proofs below therefore include explicit hypotheses that the left/right singular vectors are L²-unit vectors. In a standard SVD, this is automatically satisfied.

The underlying proof strategy is:

Substitute A = UΣVᵀ (the SVD decomposition)
Distribute the Frobenius inner product over the SVD sum
Bound each term σⱼ · ⟨uⱼ, B vⱼ⟩ using Cauchy-Schwarz + operator norm
Sum to get ≤ (Σ σⱼ) · ‖B‖op = ‖A‖* · ‖B‖_op

References:

Horn & Johnson, Matrix Analysis, 2nd ed., Theorem 7.4.1.1 (von Neumann)
Bhatia, Matrix Analysis, Chapter IV, Theorem IV.2.5
Rigollet, High Dimensional Statistics, Section 4.1 (stated without proof)

Helper infrastructure for the general Hölder-Schatten inequality #

source

def extendedσval {d T : ℕ} (S : SVD d T) (j : Fin (min d T)) :

ℝ

Extended singular values: pads S.σval : Fin S.r → ℝ with zeros to obtain a function on Fin (min d T).

Instances For

source

theorem extendedσval_nonneg {d T : ℕ} (S : SVD d T) (j : Fin (min d T)) :

0 ≤ extendedσval S j

source

theorem sum_fin_of_zero_tail {r n : ℕ} (hr : r ≤ n) (f : Fin n → ℝ) (hf : ∀ (j : Fin n), r ≤ ↑j → f j = 0) :

∑ j : Fin n, f j = ∑ j : Fin r, f ⟨↑j, ⋯⟩

A sum over Fin n of a function that is zero for indices ≥ r equals the sum over Fin r.

source

theorem sum_extendedσval_rpow {d T : ℕ} (S : SVD d T) (p : ℝ) (hp : 0 < p) :

∑ j : Fin (min d T), extendedσval S j ^ p = ∑ j : Fin S.r, S.σval j ^ p

The sum of (extendedσval S j) ^ p over Fin (min d T) equals the sum of (S.σval j) ^ p over Fin S.r.

source

theorem holderConjugate_of_div_add {p q : ℝ} (hp : 1 ≤ p) (hq : 1 ≤ q) (hpq : 1 / p + 1 / q = 1) :

p.HolderConjugate q

Derive Real.HolderConjugate p q from the hypotheses 1 ≤ p, 1 ≤ q, and 1/p + 1/q = 1.

source

theorem von_neumann_trace_ineq {d T : ℕ} (A B : Matrix (Fin d) (Fin T) ℝ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (S_B : SVD d T) (hB : S_B.IsDecompOf B) :

|frobeniusInnerProduct A B| ≤ ∑ j : Fin (min d T), extendedσval S_A j * extendedσval S_B j

Von Neumann's trace inequality. For matrices A, B with SVDs S_A, S_B: |⟨A, B⟩| ≤ Σⱼ σⱼ(A) · σⱼ(B) where the singular values are extended to the common index Fin (min d T).

This fundamental result in matrix analysis is stated without proof in Rigollet's text and proved in:

Horn & Johnson, Matrix Analysis, 2nd ed., Theorem 7.4.1.1
Bhatia, Matrix Analysis, Chapter IV, Theorem IV.2.5

We introduce it as an axiom since the proof is not given in the textbook.

source

theorem holder_schatten_nuclear_op {d T : ℕ} (A B : Matrix (Fin d) (Fin T) ℝ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (hu : ∀ (j : Fin S_A.r), ‖WithLp.toLp 2 (S_A.u j)‖ = 1) (hv : ∀ (j : Fin S_A.r), ‖WithLp.toLp 2 (S_A.v j)‖ = 1) :

frobeniusInnerProduct A B ≤ nuclearNormSVD S_A * matrixOpNorm B

Hölder's inequality for Schatten norms (special case p=1, q=∞).

⟨A, B⟩ ≤ ‖A‖_* · ‖B‖_op

where ‖A‖_* is the nuclear (trace/Schatten-1) norm and ‖B‖_op is the operator (spectral/Schatten-∞) norm.

The hypotheses hu and hv assert that the SVD singular vectors are L²-unit vectors, which is always the case for a proper SVD.

Proof: Let A = UΣVᵀ be the SVD. Then ⟨A, B⟩ = Σⱼ σⱼ · uⱼᵀBvⱼ ≤ Σⱼ σⱼ · ‖B‖op = ‖A‖* · ‖B‖_op, using uⱼᵀBvⱼ ≤ ‖B‖_op (by Cauchy-Schwarz and the operator norm bound).

source

theorem holder_schatten_general {d T : ℕ} (A B : Matrix (Fin d) (Fin T) ℝ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (S_B : SVD d T) (hB : S_B.IsDecompOf B) (p q : ℝ) (hp : 1 ≤ p) (hq : 1 ≤ q) (hpq : 1 / p + 1 / q = 1) :

|frobeniusInnerProduct A B| ≤ schattenNorm S_A p * schattenNorm S_B q

Hölder's inequality for Schatten norms (general form).

For p, q ∈ [1, ∞] with 1/p + 1/q = 1: |⟨A, B⟩| ≤ ‖A‖_p · ‖B‖_q

where ‖·‖_p denotes the Schatten p-norm.

Proof sketch (not formalized): By von Neumann's trace inequality, |⟨A, B⟩| ≤ Σⱼ σⱼ(A) · σⱼ(B). Then apply the classical Hölder inequality for finite sequences to the singular value vectors: Σⱼ σⱼ(A) · σⱼ(B) ≤ (Σⱼ σⱼ(A)^p)^{1/p} · (Σⱼ σⱼ(B)^q)^{1/q} = ‖A‖_p · ‖B‖_q.

This general form uses von Neumann's trace inequality (introduced as a theorem since its proof is not given in the textbook) together with Mathlib's Hölder inequality for finite nonnegative sequences.

Documentation