Documentation

Atlas.HighDimensionalStatistics.code.Chapter4.Thm_4_4

Squared Frobenius norm #

Squared Frobenius norm of a matrix: ‖A‖_F² = Σᵢ Σⱼ (Aᵢⱼ)²

Instances For

    Frobenius inner product #

    Frobenius inner product: ⟨A, B⟩_F = Σᵢ Σⱼ Aᵢⱼ·Bᵢⱼ

    Instances For

      Rank Penalization Objective and Estimator #

      noncomputable def Rigollet.Chapter4.Thm_4_4.rankPenObjective {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) (Θ : Matrix (Fin d) (Fin T) ) :

      The rank penalization objective function: L(Θ) = (1/n)‖Y - XΘ‖_F² + 2τ²·rank(Θ)

      This is the spectral-domain analogue of the BIC estimator: instead of penalizing ℓ₀ sparsity of the parameter vector, we penalize the rank of the parameter matrix.

      Instances For
        def Rigollet.Chapter4.Thm_4_4.IsRankPenalizationEstimator {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) (Θhat : Matrix (Fin d) (Fin T) ) :

        IsRankPenalizationEstimator Y X τ Θhat asserts that Θhat is a minimizer of the rank penalization objective: Θ̂^RK ∈ argmin_Θ { (1/n)‖Y - XΘ‖_F² + 2τ²·rank(Θ) }

        This is the multivariate analogue of the BIC estimator (Definition 2.3) in the spectral domain.

        Instances For
          theorem Rigollet.Chapter4.Thm_4_4.rankPenObjective_nonneg {n d T : } (Y : Matrix (Fin n) (Fin T) ) (X : Matrix (Fin n) (Fin d) ) (τ : ) (Θ : Matrix (Fin d) (Fin T) ) :
          0 rankPenObjective Y X τ Θ

          The rank penalization objective is nonneg when n > 0 and τ ≥ 0.

          Prediction MSE #

          noncomputable def Rigollet.Chapter4.Thm_4_4.predictionMSE {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) :

          Prediction mean squared error: (1/n)‖XΘ̂ - XΘ*‖_F²

          Instances For
            theorem Rigollet.Chapter4.Thm_4_4.predictionMSE_nonneg {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) :
            0 predictionMSE X Θhat Θstar
            theorem Rigollet.Chapter4.Thm_4_4.predictionMSE_self {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θ : Matrix (Fin d) (Fin T) ) :
            predictionMSE X Θ Θ = 0

            Theorem 4.4: Rank Penalization Estimator Bound #

            Proof strategy #

            The proof uses the minimizer property and a direct bound. From the rank penalization minimality at Θ = Θ*:

            (1/n)‖Y - XΘ̂‖_F² + 2τ²rank(Θ̂) ≤ (1/n)‖Y - XΘ*‖_F² + 2τ²rank(Θ*)

            Substituting Y = XΘ* + E and expanding: (1/n)‖E - XΔ‖_F² + 2τ²rank(Θ̂) ≤ (1/n)‖E‖_F² + 2τ²rank(Θ*)

            where Δ = Θ̂ - Θ*. Expanding ‖E - XΔ‖² = ‖E‖² + ‖XΔ‖² - 2⟨E, XΔ⟩: (1/n)(‖E‖² + ‖XΔ‖² - 2⟨E,XΔ⟩) + 2τ²rank(Θ̂) ≤ (1/n)‖E‖² + 2τ²rank(Θ*)

            Cancel ‖E‖²: (1/n)(‖XΔ‖² - 2⟨E,XΔ⟩) + 2τ²rank(Θ̂) ≤ 2τ²rank(Θ*)

            Rearranging: (1/n)‖XΔ‖² ≤ (2/n)⟨E,XΔ⟩ + 2τ²rank(Θ*) - 2τ²rank(Θ̂)

            Then bound the cross-term using Hölder's inequality and SVD: (2/n)⟨E,XΔ⟩ ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂) + rank(Θ*))

            This combines Young's inequality (2ab ≤ a² + b²) with the spectral bound from the SVD and the operator norm hypothesis.

            Substituting: (1/n)‖XΔ‖² ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂)+rank(Θ*)) + 2τ²rank(Θ*) - 2τ²rank(Θ̂) (1/(2n))‖XΔ‖² ≤ 4τ²rank(Θ*) (rank(Θ̂) cancels) (1/n)‖XΔ‖² ≤ 8τ²rank(Θ*)

            theorem Rigollet.Chapter4.Thm_4_4.frobeniusNormSq_sub_eq {m p : } (A B : Matrix (Fin m) (Fin p) ) :
            frobeniusNormSq (A - B) = frobeniusNormSq A + frobeniusNormSq B - 2 * i : Fin m, j : Fin p, A i j * B i j

            The Frobenius norm expansion identity: ‖A - B‖_F² = ‖A‖_F² + ‖B‖_F² - 2·Σᵢⱼ Aᵢⱼ·Bᵢⱼ

            Cross-term bound (Steps 3-5 of Rigollet's proof) #

            The key analytic step that requires SVD and Schatten norm machinery:

            Given Φ with orthonormal columns spanning col(X) and the operator norm bound ‖ΦᵀE‖_op² ≤ nτ² (from Lemma 4.2 applied to ΦᵀE), the cross-term satisfies:

            (2/n)⟨E, XΔ⟩ ≤ (1/(2n))‖XΔ‖_F² + 2τ²(rank(Θ̂) + rank(Θ*))

            This uses:

            1. Write XΔ = ΦN (since col(X) ⊆ col(Φ)), with rank(XΔ) ≤ rank(Θ̂) + rank(Θ*)
            2. ⟨E, XΔ/‖XΔ‖_F⟩ = ⟨ΦᵀE, N/‖N‖_F⟩ ≤ ‖ΦᵀE‖_op · ‖N/‖N‖F‖* (Hölder)
            3. Nuclear-Frobenius: ‖N/‖N‖F‖* ≤ √rank(N) (Cauchy-Schwarz on singular values)
            4. Apply Young's inequality: 2ab ≤ a²/ε + εb² with appropriate ε
            5. Use rank(XΔ) ≤ rank(Θ̂) + rank(Θ*) (rank subadditivity for Δ = Θ̂ - Θ*)

            This requires SVD decomposition and Schatten/nuclear norm inequalities that are not available in Mathlib. The SVD-dependent combined bound svd_inner_product_rank_bound is proved using holder_trace_rank_bound (axiom) as the single remaining unproved ingredient.

            theorem Rigollet.Chapter4.Thm_4_4.frobeniusInner_transpose_mul {n d T : } (X : Matrix (Fin n) (Fin d) ) (E : Matrix (Fin n) (Fin T) ) (A : Matrix (Fin d) (Fin T) ) :

            Frobenius inner product transpose property. ⟨E, X·A⟩_F = ⟨Xᵀ·E, A⟩_F

            Proof: Expand both sides as triple sums and swap summation order.

            theorem Rigollet.Chapter4.Thm_4_4.rank_sub_le {m p : } (A B : Matrix (Fin m) (Fin p) ) :
            (A - B).rank A.rank + B.rank

            Rank subadditivity. rank(A - B) ≤ rank(A) + rank(B). Proof: The range of (A-B).mulVecLin is contained in A.mulVecLin.range ⊔ (-B).mulVecLin.range.

            theorem Rigollet.Chapter4.Thm_4_4.rank_mul_sub_le_add {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) :
            (X * (Θhat - Θstar)).rank Θhat.rank + Θstar.rank

            Rank bound for multiplication with difference. rank(X · (Θ̂ - Θ*)) ≤ rank(Θ̂) + rank(Θ*)

            Proof: rank(AB) ≤ rank(B) and rank(A-B) ≤ rank(A) + rank(B).

            Frobenius inner product helper lemmas #

            Scaling the second argument of the Frobenius inner product.

            Frobenius inner product with itself equals squared Frobenius norm.

            theorem Rigollet.Chapter4.Thm_4_4.entries_zero_of_frobeniusNormSq_zero {m p : } (B : Matrix (Fin m) (Fin p) ) (hB : frobeniusNormSq B = 0) (i : Fin m) (j : Fin p) :
            B i j = 0

            When frobeniusNormSq B = 0, all entries of B are zero.

            Frobenius inner product with a zero-norm matrix is zero.

            SVD-based unit inner product bound #

            The following is the SVD-dependent core of the proof of Theorem 4.4. Given the unit direction U = XΔ/‖XΔ‖_F, the textbook shows:

            ⟨E, U⟩² ≤ n · τ² · (rank(Θ̂) + rank(Θ*))

            The proof (Rigollet, lines 2842-2860) uses three sub-results that are stated without proof in Section 4.1 or are standard linear algebra facts:

            1. SVD decomposition: X = ΦΛΨᵀ yields XΔ = ΦN with Φ orthonormal
            2. Von Neumann trace / Hölder inequality for Schatten norms (Section 4.1): |⟨A, B⟩| ≤ ‖A‖op · ‖B‖* (nuclear norm × operator norm)
            3. Nuclear-Frobenius inequality (Cauchy-Schwarz on singular values): ‖M‖_* / ‖M‖_F ≤ √rank(M)
            4. Operator norm bound via Lemma 4.2: ‖ΦᵀE‖_op ≤ √(nτ²)

            The proof factors through three helper lemmas:

            theorem Rigollet.Chapter4.Thm_4_4.frobeniusNormSq_orth_left {n r T : } (Φ : Matrix (Fin n) (Fin r) ) ( : Φ.transpose * Φ = 1) (C : Matrix (Fin r) (Fin T) ) :

            Frobenius norm is preserved by left-multiplication with orthonormal columns. ‖ΦC‖²_F = ‖C‖²_F when ΦᵀΦ = I.

            Proof: ‖ΦC‖²_F = ⟨ΦC, ΦC⟩_F = ⟨ΦᵀC), C⟩_F = ⟨(ΦᵀΦ)*C, C⟩_F = ⟨C, C⟩_F = ‖C‖²_F

            Von Neumann trace inequality combined with Cauchy-Schwarz on singular values: |⟨A,B⟩_F|² ≤ ‖A‖²_op · rank(B) · ‖B‖²_F. The textbook states this without proof in Section 4.1.1 (Schatten norms and Hölder's inequality for traces). References: Horn & Johnson, Matrix Analysis, Thm 7.4.1.1; Bhatia, Matrix Analysis, IV.2.5.

            theorem Rigollet.Chapter4.Thm_4_4.rank_le_of_left_inverse_mul {n r T : } (Φ : Matrix (Fin n) (Fin r) ) ( : Φ.transpose * Φ = 1) (C : Matrix (Fin r) (Fin T) ) :
            C.rank (Φ * C).rank

            Rank is monotone under left-multiplication by a matrix with left inverse. rank(C) ≤ rank(ΦC) when ΦᵀΦ = I.

            Proof: C = Φᵀ*(ΦC), so rank(C) = rank(ΦᵀC)) ≤ rank(ΦC).

            theorem Rigollet.Chapter4.Thm_4_4.svd_inner_product_rank_bound {n d T r : } (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ) (Δ : Matrix (Fin d) (Fin T) ) (τ : ) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) (hXΔ_pos : 0 < frobeniusNormSq (X * Δ)) :
            frobeniusInner E ((1 / (frobeniusNormSq (X * Δ))) (X * Δ)) ^ 2 n * τ ^ 2 * (X * Δ).rank

            SVD-dependent inner product bound.

            For U = XΔ/‖XΔ‖_F the unit-Frobenius-norm direction of M = XΔ: ⟨E, U⟩² ≤ n · τ² · rank(M)

            Textbook proof (Rigollet, lines 2842-2860): Write M = ΦN (Φ has orthonormal columns spanning col(X)). Then ⟨E, U⟩ = ⟨ΦᵀE, N/‖N‖_F⟩. By Hölder (Section 4.1, stated without proof): |⟨ΦᵀE, N/‖N‖_F⟩| ≤ ‖ΦᵀE‖_op · ‖N/‖N‖F‖. Nuclear-Frobenius (Cauchy-Schwarz on singular values, not proved in book): ‖N/‖N‖F‖ ≤ √rank(N). And ‖ΦᵀE‖_op² ≤ nτ² (Lemma 4.2 + SVD of X).

            Proof structure: Factor XΔ = ΦC through the column space basis Φ, use orthogonal invariance of the Frobenius norm, the Hölder trace-rank bound (holder_trace_rank_bound, axiom), and rank monotonicity.

            theorem Rigollet.Chapter4.Thm_4_4.unit_inner_product_svd_bound {n d T r : } (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) (τ : ) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) (hXΔ_pos : 0 < frobeniusNormSq (X * Θhat - X * Θstar)) :
            frobeniusInner E ((1 / (frobeniusNormSq (X * Θhat - X * Θstar))) (X * Θhat - X * Θstar)) ^ 2 n * τ ^ 2 * (Θhat.rank + Θstar.rank)

            SVD-dependent unit inner product bound.

            When U is the unit-Frobenius-norm direction of XΔ = X(Θ̂-Θ*), the squared Frobenius inner product ⟨E, U⟩² is bounded by n·τ²·(rank(Θ̂) + rank(Θ*)).

            Proof: Combines the SVD inner product bound (svd_inner_product_rank_bound, which uses holder_trace_rank_bound as the only axiom) giving ⟨E, U⟩² ≤ n·τ²·rank(XΔ), with the rank subadditivity result rank(X(Θ̂-Θ*)) ≤ rank(Θ̂) + rank(Θ*).

            theorem Rigollet.Chapter4.Thm_4_4.crossterm_svd_bound {n d T r : } (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) (τ : ) (hn : 0 < n) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) :
            2 / n * frobeniusInner E (X * Θhat - X * Θstar) 1 / (2 * n) * frobeniusNormSq (X * Θhat - X * Θstar) + 2 * τ ^ 2 * (Θhat.rank + Θstar.rank)

            Cross-term bound via Hölder and SVD (core spectral inequality).

            This is the key step in the proof of Theorem 4.4 that combines:

            The proof splits into two cases:

            • If ‖XΔ‖_F = 0, the cross-term is 0 and the bound holds trivially.
            • If ‖XΔ‖_F > 0, we decompose ⟨E, XΔ⟩ = ⟨E, U⟩·‖XΔ‖_F where U = XΔ/‖XΔ‖_F, apply Young's inequality, and use the SVD bound on ⟨E, U⟩².
            theorem Rigollet.Chapter4.Thm_4_4.crossTermBound_from_opNorm {n d T r : } (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ) (Θhat Θstar : Matrix (Fin d) (Fin T) ) (τ : ) (_hτ_pos : 0 < τ) (hn : 0 < n) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) :
            2 / n * frobeniusInner E (X * Θhat - X * Θstar) 1 / (2 * n) * frobeniusNormSq (X * Θhat - X * Θstar) + 2 * τ ^ 2 * (Θhat.rank + Θstar.rank)

            Cross-term bound (Steps 3-5 of Rigollet's proof).

            Given Φ with orthonormal columns spanning col(X) and ‖ΦᵀE‖_op² ≤ nτ², the cross-term in the Frobenius expansion satisfies:

            (2/n)⟨E, XΔ⟩_F ≤ (1/(2n))‖XΔ‖_F² + 2τ²(rank(Θ̂) + rank(Θ*))

            where Δ = Θ̂ - Θ*.

            Proof: This is a direct consequence of crossterm_svd_bound, which captures the SVD-based spectral argument from the textbook proof.

            Step 1: Minimizer inequality after substitution #

            theorem Rigollet.Chapter4.Thm_4_4.minimizer_substituted {n d T : } (X : Matrix (Fin n) (Fin d) ) (Θstar : Matrix (Fin d) (Fin T) ) (E Y : Matrix (Fin n) (Fin T) ) (hY : Y = X * Θstar + E) (τ : ) (Θhat : Matrix (Fin d) (Fin T) ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) :
            1 / n * frobeniusNormSq (E - (X * Θhat - X * Θstar)) + 2 * τ ^ 2 * Θhat.rank 1 / n * frobeniusNormSq E + 2 * τ ^ 2 * Θstar.rank

            The minimizer inequality expressed in terms of E and Δ: After substituting Y = XΘ* + E: (1/n)‖E - X(Θ̂-Θ*)‖² + 2τ²rank(Θ̂) ≤ (1/n)‖E‖² + 2τ²rank(Θ*)

            theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_deterministic {n d T r : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ) (E Y : Matrix (Fin n) (Fin T) ) (hY : Y = X * Θstar + E) (τ : ) ( : 0 < τ) (Θhat : Matrix (Fin d) (Fin T) ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) :
            predictionMSE X Θhat Θstar 8 * Θstar.rank * τ ^ 2

            Theorem 4.4 (deterministic core inequality).

            Consider the multivariate regression model Y = XΘ* + E. If Θ̂^RK minimizes (1/n)‖Y - XΘ‖_F² + 2τ²·rank(Θ) and Φ has orthonormal columns spanning col(X) with ‖ΦᵀE‖_op² ≤ nτ², then:

            (1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 8 · rank(Θ*) · τ²

            The proof uses:

            1. The minimizer property (comparing Θ̂ to Θ*)
            2. Frobenius norm expansion ‖E - XΔ‖² = ‖E‖² + ‖XΔ‖² - 2⟨E,XΔ⟩
            3. The cross-term bound (proved via Hölder-nuclear-Frobenius helper): (2/n)⟨E,XΔ⟩ ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂) + rank(Θ*))
            4. Algebraic cancellation of rank(Θ̂) terms
            theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_probabilistic {n d T r : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ) (E Y : ΩMatrix (Fin n) (Fin T) ) (hY : ∀ (ω : Ω), Y ω = X * Θstar + E ω) {σsq : } (_hσ : 0 < σsq) (_hE : IsSubGaussianMatrix E σsq μ) (τ : ) ( : 0 < τ) (Θhat : ΩMatrix (Fin d) (Fin T) ) (hRK : ∀ (ω : Ω), IsRankPenalizationEstimator (Y ω) X τ (Θhat ω)) (ω : Ω) :
            matrixOpNorm (Φ.transpose * E ω) ^ 2 n * τ ^ 2predictionMSE X (Θhat ω) Θstar 8 * Θstar.rank * τ ^ 2

            Theorem 4.4 (probabilistic version).

            Under the multivariate regression model Y(ω) = XΘ* + E(ω) where E ~ subG_{n×T}(σ²), the rank penalization estimator satisfies:

            (1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 8 · rank(Θ*) · τ²

            on the event where ‖ΦᵀE‖_op² ≤ nτ² (which by Lemma 4.2 + SVD of X has probability at least 1 - δ when τ is chosen appropriately).

            This gives the rate ≲ σ²·rank(Θ*)·(d ∨ T + log(1/δ))/n with probability 1-δ.

            Rate bound: connecting 8·rank(Θ*)·τ² to σ²·rank(Θ*)·(d∨T+log(1/δ))/n #

            The textbook states that the bound 8·rank(Θ*)·τ² with τ chosen from Lemma 4.2 yields the rate ≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n.

            When the "score matrix" (1/n)XᵀE is sub-Gaussian with variance proxy σsq/n (which follows from the regression model when X has orthonormal rows), Lemma 4.2 gives: τ₀ = 4√(σsq/n)·√(log(12)·(d∨T)) + 2√(σsq/n)·√(2·log(1/δ))

            Then τ₀² ≤ 2·(16·(σsq/n)·log(12)·(d∨T)) + 2·(4·(σsq/n)·2·log(1/δ)) = 32·log(12)·(σsq/n)·(d∨T) + 16·(σsq/n)·log(1/δ) ≤ 32·log(12)·(σsq/n)·(d∨T + log(1/δ))

            So 8·rank(Θ*)·τ₀² ≤ 256·log(12)·σsq·rank(Θ*)·(d∨T + log(1/δ))/n.

            The constant 256·log(12) ≈ 636 can be improved; the "≲" notation hides it.

            theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_rate_bound (σ₀ : ) (hσ₀ : 0 σ₀) (D : ) (hD : 0 D) (L : ) (hL : 0 L) (r : ) (hr : 0 r) :
            have τ₀ := 4 * σ₀ * (Real.log 12 * D) + 2 * σ₀ * (2 * L); 8 * r * τ₀ ^ 2 256 * Real.log 12 * σ₀ * r * (D + L)

            Rate bound for Theorem 4.4.

            When τ = 4√σ₀·√(log(12)·D) + 2√σ₀·√(2·L) where σ₀ = σsq/n is the per-sample variance proxy and D = d∨T, L = log(1/δ), then:

            8·rank(Θ*)·τ² ≤ 256·log(12) · σsq/n · rank(Θ*) · (D + L)

            This uses (a+b)² ≤ 2(a²+b²) and elementary arithmetic. The bound captures the "≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n" rate.

            theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_full_rate {n d T r : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ) (E Y : Matrix (Fin n) (Fin T) ) (hY : Y = X * Θstar + E) (σ₀ : ) (hσ₀ : 0 σ₀) (D : ) (hD : 0 D) (L : ) (hL : 0 L) (τ : ) ( : 0 < τ) (hτ_def : τ = 4 * σ₀ * (Real.log 12 * D) + 2 * σ₀ * (2 * L)) (Θhat : Matrix (Fin d) (Fin T) ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 n * τ ^ 2) :
            predictionMSE X Θhat Θstar 256 * Real.log 12 * σ₀ * Θstar.rank * (D + L)

            Theorem 4.4 (full probabilistic version with rate).

            Under the multivariate regression model Y(ω) = XΘ* + E(ω) where E ~ subG_{n×T}(σ²), the rank penalization estimator with τ = 4√(σsq/n)·√(log(12)·(d∨T)) + 2√(σsq/n)·√(2·log(1/δ)) satisfies:

            (1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 256·log(12) · (σsq/n) · rank(Θ*) · (d∨T + log(1/δ))

            on the event where ‖ΦᵀE‖_op² ≤ nτ², which has probability at least 1 - δ by Lemma 4.2 + SVD of X.

            This formalizes the textbook's "≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n" with the explicit constant 256·log(12).

            theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_with_probability {n d T r : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin r) ) (hΦ_orth : Φ.transpose * Φ = 1) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hprob : MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ) (E Y : ΩMatrix (Fin n) (Fin T) ) (hY : ∀ (ω : Ω), Y ω = X * Θstar + E ω) (σsq : ) ( : 0 < σsq) (hE : IsSubGaussianMatrix (fun (ω : Ω) => Φ.transpose * E ω) σsq μ) (hr : 0 < r) (hT : 0 < T) {δ : } (hδ_pos : 0 < δ) (hδ_lt : δ < 1) (τ : ) (hτ_def : τ = 4 * σsq * (Real.log 12 * (max r T)) + 2 * σsq * (2 * Real.log (1 / δ))) ( : 0 < τ) (Θhat : ΩMatrix (Fin d) (Fin T) ) (hRK : ∀ (ω : Ω), IsRankPenalizationEstimator (Y ω) X τ (Θhat ω)) :
            μ {ω : Ω | predictionMSE X (Θhat ω) Θstar > 8 * Θstar.rank * τ ^ 2} ENNReal.ofReal δ

            Theorem 4.4 (full probabilistic statement with measure bound).

            Under the multivariate regression model Y(ω) = XΘ* + E(ω) where ΦᵀE ~ subG_{r×T}(σ²) (the projected noise matrix is sub-Gaussian), the rank penalization estimator with τ = 4√σ²·√(log(12)·(r∨T)) + 2√σ²·√(2·log(1/δ)) satisfies:

            Pμ { ω | predictionMSE X (Θ̂(ω)) Θ* > 8 · rank(Θ*) · τ² } ≤ δ

            This is the measure-theoretic formalization of the textbook's "with probability at least 1 - δ" statement of Theorem 4.4.

            Proof structure:

            1. The deterministic theorem gives: on {ω | ‖ΦᵀE(ω)‖_op² ≤ nτ²}, the MSE bound holds.
            2. The bad MSE event is contained in {ω | ‖ΦᵀE(ω)‖_op > τ} (since ‖ΦᵀE‖_op ≤ τ ⇒ ‖ΦᵀE‖_op² ≤ τ² ≤ nτ² for n≥1).
            3. By lemma_4_2_operator_norm_high_prob, this event has probability ≤ δ.