Documentation

Atlas.HighDimensionalStatistics.code.Chapter4.Background_Holder_Schatten

Schatten p-norm #

The Schatten p-norm of a matrix is the ℓ_p norm of its singular value vector. Special cases:

noncomputable def schattenNorm {d T : } (S : SVD d T) (q : ) :

The Schatten q-norm of a matrix given its SVD: ‖A‖_q = (Σⱼ σⱼ^q)^{1/q} for q ∈ [1, ∞).

Instances For
    def nuclearNormSVD {d T : } (S : SVD d T) :

    The nuclear norm (Schatten 1-norm): ‖A‖_* = Σⱼ σⱼ

    Instances For
      def frobeniusInnerProduct {d T : } (A B : Matrix (Fin d) (Fin T) ) :

      The Frobenius inner product: ⟨A, B⟩ = Tr(AᵀB) = Σᵢⱼ Aᵢⱼ Bᵢⱼ

      Instances For

        Helper lemmas for the Frobenius inner product #

        theorem matrix_sum_apply {d T : } {ι : Type u_1} [Fintype ι] (f : ιMatrix (Fin d) (Fin T) ) (i : Fin d) (j : Fin T) :
        (∑ k : ι, f k) i j = k : ι, f k i j

        Pointwise application of a sum of matrices.

        theorem frobeniusInnerProduct_sum {d T : } {ι : Type u_1} [Fintype ι] (f : ιMatrix (Fin d) (Fin T) ) (B : Matrix (Fin d) (Fin T) ) :
        frobeniusInnerProduct (∑ k : ι, f k) B = k : ι, frobeniusInnerProduct (f k) B

        The Frobenius inner product distributes over sums in the first argument.

        theorem frobeniusInnerProduct_smul {d T : } (c : ) (A B : Matrix (Fin d) (Fin T) ) :

        The Frobenius inner product scales in the first argument.

        theorem frobeniusInnerProduct_vecMulVec {d T : } (u : Fin d) (v : Fin T) (B : Matrix (Fin d) (Fin T) ) :

        The Frobenius inner product of a rank-1 matrix with B equals dotProduct u (B.mulVec v).

        Operator norm bound for bilinear forms #

        The key ingredient: for unit vectors u, v, dotProduct u (B.mulVec v) ≤ ‖B‖_op by Cauchy-Schwarz and the operator norm bound.

        theorem dotProduct_mulVec_le_opNorm {d T : } (u : Fin d) (v : Fin T) (B : Matrix (Fin d) (Fin T) ) (hu : WithLp.toLp 2 u = 1) (hv : WithLp.toLp 2 v = 1) :

        For unit vectors u, v: dotProduct u (B.mulVec v) ≤ ‖B‖_op.

        Hölder's inequality for Schatten norms #

        Hölder-Schatten inequalities #

        The SVD structure in our formalization does not encode orthonormality of the singular vectors. The proofs below therefore include explicit hypotheses that the left/right singular vectors are L²-unit vectors. In a standard SVD, this is automatically satisfied.

        The underlying proof strategy is:

        1. Substitute A = UΣVᵀ (the SVD decomposition)
        2. Distribute the Frobenius inner product over the SVD sum
        3. Bound each term σⱼ · ⟨uⱼ, B vⱼ⟩ using Cauchy-Schwarz + operator norm
        4. Sum to get ≤ (Σ σⱼ) · ‖B‖op = ‖A‖* · ‖B‖_op

        References:

        Helper infrastructure for the general Hölder-Schatten inequality #

        def extendedσval {d T : } (S : SVD d T) (j : Fin (min d T)) :

        Extended singular values: pads S.σval : Fin S.r → ℝ with zeros to obtain a function on Fin (min d T).

        Instances For
          theorem extendedσval_nonneg {d T : } (S : SVD d T) (j : Fin (min d T)) :
          theorem sum_fin_of_zero_tail {r n : } (hr : r n) (f : Fin n) (hf : ∀ (j : Fin n), r jf j = 0) :
          j : Fin n, f j = j : Fin r, f j,

          A sum over Fin n of a function that is zero for indices ≥ r equals the sum over Fin r.

          theorem sum_extendedσval_rpow {d T : } (S : SVD d T) (p : ) (hp : 0 < p) :
          j : Fin (min d T), extendedσval S j ^ p = j : Fin S.r, S.σval j ^ p

          The sum of (extendedσval S j) ^ p over Fin (min d T) equals the sum of (S.σval j) ^ p over Fin S.r.

          theorem holderConjugate_of_div_add {p q : } (hp : 1 p) (hq : 1 q) (hpq : 1 / p + 1 / q = 1) :

          Derive Real.HolderConjugate p q from the hypotheses 1 ≤ p, 1 ≤ q, and 1/p + 1/q = 1.

          theorem von_neumann_trace_ineq {d T : } (A B : Matrix (Fin d) (Fin T) ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (S_B : SVD d T) (hB : S_B.IsDecompOf B) :

          Von Neumann's trace inequality. For matrices A, B with SVDs S_A, S_B: |⟨A, B⟩| ≤ Σⱼ σⱼ(A) · σⱼ(B) where the singular values are extended to the common index Fin (min d T).

          This fundamental result in matrix analysis is stated without proof in Rigollet's text and proved in:

          • Horn & Johnson, Matrix Analysis, 2nd ed., Theorem 7.4.1.1
          • Bhatia, Matrix Analysis, Chapter IV, Theorem IV.2.5

          We introduce it as an axiom since the proof is not given in the textbook.

          theorem holder_schatten_nuclear_op {d T : } (A B : Matrix (Fin d) (Fin T) ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (hu : ∀ (j : Fin S_A.r), WithLp.toLp 2 (S_A.u j) = 1) (hv : ∀ (j : Fin S_A.r), WithLp.toLp 2 (S_A.v j) = 1) :

          Hölder's inequality for Schatten norms (special case p=1, q=∞).

          ⟨A, B⟩ ≤ ‖A‖_* · ‖B‖_op

          where ‖A‖_* is the nuclear (trace/Schatten-1) norm and ‖B‖_op is the operator (spectral/Schatten-∞) norm.

          The hypotheses hu and hv assert that the SVD singular vectors are L²-unit vectors, which is always the case for a proper SVD.

          Proof: Let A = UΣVᵀ be the SVD. Then ⟨A, B⟩ = Σⱼ σⱼ · uⱼᵀBvⱼ ≤ Σⱼ σⱼ · ‖B‖op = ‖A‖* · ‖B‖_op, using uⱼᵀBvⱼ ≤ ‖B‖_op (by Cauchy-Schwarz and the operator norm bound).

          theorem holder_schatten_general {d T : } (A B : Matrix (Fin d) (Fin T) ) (S_A : SVD d T) (hA : S_A.IsDecompOf A) (S_B : SVD d T) (hB : S_B.IsDecompOf B) (p q : ) (hp : 1 p) (hq : 1 q) (hpq : 1 / p + 1 / q = 1) :

          Hölder's inequality for Schatten norms (general form).

          For p, q ∈ [1, ∞] with 1/p + 1/q = 1: |⟨A, B⟩| ≤ ‖A‖_p · ‖B‖_q

          where ‖·‖_p denotes the Schatten p-norm.

          Proof sketch (not formalized): By von Neumann's trace inequality, |⟨A, B⟩| ≤ Σⱼ σⱼ(A) · σⱼ(B). Then apply the classical Hölder inequality for finite sequences to the singular value vectors: Σⱼ σⱼ(A) · σⱼ(B) ≤ (Σⱼ σⱼ(A)^p)^{1/p} · (Σⱼ σⱼ(B)^q)^{1/q} = ‖A‖_p · ‖B‖_q.

          This general form uses von Neumann's trace inequality (introduced as a theorem since its proof is not given in the textbook) together with Mathlib's Hölder inequality for finite nonnegative sequences.