Documentation

Atlas.HighDimensionalStatistics.code.Chapter4.Remark_4_5

Squared Frobenius norm #

noncomputable def Chapter4.frobeniusNormSq {m p : } (A : Matrix (Fin m) (Fin p) ) :

Squared Frobenius norm of a matrix: ‖A‖_F² = Σᵢ Σⱼ (Aᵢⱼ)²

Instances For

    Operator norm helper for rectangular matrices #

    noncomputable def Chapter4.rectOpNorm {m p : } (A : Matrix (Fin m) (Fin p) ) :

    Operator norm of a rectangular real matrix, via its continuous linear map.

    Instances For

      Nuclear norm (via duality) #

      The nuclear norm (Schatten 1-norm, trace norm) of a matrix: ‖Θ‖_* = Σⱼ σⱼ(Θ) where σⱼ are the singular values of Θ.

      We define it via the duality characterization: ‖A‖_* = sup { ⟨B, A⟩_F : ‖B‖_op ≤ 1 }

      noncomputable def Chapter4.nuclearNorm {d T : } (A : Matrix (Fin d) (Fin T) ) :

      Nuclear norm (trace norm, Schatten 1-norm) of a matrix, defined via duality: ‖A‖_* = sup { ⟨B, A⟩_F : ‖B‖_op ≤ 1 }

      Instances For
        theorem Chapter4.nuclearNorm_nonneg {d T : } (Θ : Matrix (Fin d) (Fin T) ) :

        Singular value predicates #

        The SVD of X ∈ ℝ^{n×d} has singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0 where r = rank(X). We define the largest singular value as the operator norm, and characterize the smallest positive singular value by its positivity and bound relative to the largest.

        def Chapter4.IsLargestSingularValue {n d : } (X : Matrix (Fin n) (Fin d) ) (s : ) :

        IsLargestSingularValue X s means s is the largest singular value of X, i.e. s = σ₁(X) = ‖X‖_op (the operator norm).

        Instances For
          def Chapter4.IsSmallestPosSingularValue {n d : } (X : Matrix (Fin n) (Fin d) ) (s : ) :

          IsSmallestPosSingularValue X s means s is the smallest positive singular value of X, i.e. s = σᵣ(X) where r = rank(X). Characterized by: s is positive, bounded above by the operator norm, and is a lower bound on ‖Xv‖/‖v‖ for vectors outside the kernel.

          Instances For
            theorem Chapter4.IsLargestSingularValue.pos {n d : } {X : Matrix (Fin n) (Fin d) } {s : } (h : IsLargestSingularValue X s) (hX : X 0) :
            0 < s

            The largest singular value is positive (for nonzero matrices).

            theorem Chapter4.IsSmallestPosSingularValue.pos {n d : } {X : Matrix (Fin n) (Fin d) } {s : } (h : IsSmallestPosSingularValue X s) (_hX : X 0) :
            0 < s

            The smallest positive singular value is positive.

            theorem Chapter4.singularValue_le {n d : } {X : Matrix (Fin n) (Fin d) } {s₁ sᵣ : } (h₁ : IsLargestSingularValue X s₁) (hᵣ : IsSmallestPosSingularValue X sᵣ) :
            sᵣ s₁

            The smallest positive singular value is at most the largest.

            Nuclear Norm Penalization Objective and Estimator #

            noncomputable def Chapter4.nuclearNormObjective {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) (Θ : Matrix (Fin d) (Fin T) ) :

            The nuclear norm penalization objective: L(Θ) = (1/n)‖Y - XΘ‖F² + τ · ‖Θ‖*

            Instances For
              def Chapter4.IsNuclearNormEstimator {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) (Θhat : Matrix (Fin d) (Fin T) ) :

              Θ̂ is a nuclear norm penalization estimator if it minimizes the objective: Θ̂ ∈ argmin_Θ { (1/n)‖Y - XΘ‖F² + τ · ‖Θ‖* }

              Instances For
                theorem Chapter4.nuclearNormObjective_nonneg {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) ( : 0 τ) (Θ : Matrix (Fin d) (Fin T) ) :

                The nuclear norm objective is nonneg when τ ≥ 0.

                MSE Bound (Remark 4.5) #

                noncomputable def Chapter4.predictionMSE {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) :

                Prediction MSE: (1/n)‖XΘ̂ - XΘ*‖_F²

                Instances For
                  theorem Chapter4.predictionMSE_nonneg {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) :
                  0 predictionMSE X Θhat Θstar
                  theorem Chapter4.predictionMSE_self {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θ : Matrix (Fin d) (Fin T) ) :
                  predictionMSE X Θ Θ = 0
                  theorem Chapter4.remark_4_5_nuclear_norm_MSE_bound {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (Θstar : Matrix (Fin d) (Fin T) ) (σ : ) ( : 0 < σ) (hn : 0 < n) (hX : X 0) (sig₁ sigᵣ : ) (hsig₁ : IsLargestSingularValue X sig₁) (hsigᵣ : IsSmallestPosSingularValue X sigᵣ) :
                  ∃ (τ : ), 0 < τ ∃ (C : ), 0 < C ∀ (Θhat : Matrix (Fin d) (Fin T) ), IsNuclearNormEstimator Y X τ ΘhatpredictionMSE X Θhat Θstar C * (sig₁ / sigᵣ) * σ ^ 2 * Θstar.rank * max d T / n

                  Remark 4.5 (MSE bound for the nuclear norm penalization estimator, [KLT11]).

                  For an appropriate choice of the regularization parameter τ, any nuclear norm penalization estimator Θ̂ satisfies, with probability at least 0.99:

                  (1/n)‖XΘ̂ - XΘ*‖_F² ≤ C · (λ₁/λᵣ) · σ² · rank(Θ*) · max(d,T) / n

                  where:

                  • X has SVD with singular values λ₁ ≥ λ₂ ≥ ... ≥ λᵣ > 0,
                  • λ₁/λᵣ is the condition number of X,
                  • σ² is the noise variance,
                  • rank(Θ*) is the rank of the true parameter matrix,
                  • d ∨ T = max(d, T),
                  • C is a universal constant (quantified before ∀ Θhat).

                  Note that τ is existentially quantified: the book says "for some appropriate choice of τ." Unlike the rank penalization estimator, this bound involves the condition number λ₁/λᵣ of the design matrix X.

                  The proof is deferred to [KLT11] in the textbook, so this is axiomatized.