Atlas.HighDimensionalStatistics.code.Chapter4.Thm_4

This is the spectral-domain analogue of the BIC estimator: instead of penalizing ℓ₀ sparsity of the parameter vector, we penalize the rank of the parameter matrix.

Instances For

source

def Rigollet.Chapter4.Thm_4_4.IsRankPenalizationEstimator {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (τ : ℝ) (Θhat : Matrix (Fin d) (Fin T) ℝ) :

Prop

IsRankPenalizationEstimator Y X τ Θhat asserts that Θhat is a minimizer of the rank penalization objective: Θ̂^RK ∈ argmin_Θ { (1/n)‖Y - XΘ‖_F² + 2τ²·rank(Θ) }

This is the multivariate analogue of the BIC estimator (Definition 2.3) in the spectral domain.

Instances For

source

theorem Rigollet.Chapter4.Thm_4_4.rankPenObjective_nonneg {n d T : ℕ} (Y : Matrix (Fin n) (Fin T) ℝ) (X : Matrix (Fin n) (Fin d) ℝ) (τ : ℝ) (Θ : Matrix (Fin d) (Fin T) ℝ) :

0 ≤ rankPenObjective Y X τ Θ

The rank penalization objective is nonneg when n > 0 and τ ≥ 0.

Prediction MSE #

source

noncomputable def Rigollet.Chapter4.Thm_4_4.predictionMSE {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) :

ℝ

Prediction mean squared error: (1/n)‖XΘ̂ - XΘ*‖_F²

Instances For

source

theorem Rigollet.Chapter4.Thm_4_4.predictionMSE_nonneg {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) :

0 ≤ predictionMSE X Θhat Θstar

source

theorem Rigollet.Chapter4.Thm_4_4.predictionMSE_self {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θ : Matrix (Fin d) (Fin T) ℝ) :

predictionMSE X Θ Θ = 0

Theorem 4.4: Rank Penalization Estimator Bound #

Proof strategy #

The proof uses the minimizer property and a direct bound. From the rank penalization minimality at Θ = Θ*:

(1/n)‖Y - XΘ̂‖_F² + 2τ²rank(Θ̂) ≤ (1/n)‖Y - XΘ*‖_F² + 2τ²rank(Θ*)

Substituting Y = XΘ* + E and expanding: (1/n)‖E - XΔ‖_F² + 2τ²rank(Θ̂) ≤ (1/n)‖E‖_F² + 2τ²rank(Θ*)

where Δ = Θ̂ - Θ*. Expanding ‖E - XΔ‖² = ‖E‖² + ‖XΔ‖² - 2⟨E, XΔ⟩: (1/n)(‖E‖² + ‖XΔ‖² - 2⟨E,XΔ⟩) + 2τ²rank(Θ̂) ≤ (1/n)‖E‖² + 2τ²rank(Θ*)

Cancel ‖E‖²: (1/n)(‖XΔ‖² - 2⟨E,XΔ⟩) + 2τ²rank(Θ̂) ≤ 2τ²rank(Θ*)

Rearranging: (1/n)‖XΔ‖² ≤ (2/n)⟨E,XΔ⟩ + 2τ²rank(Θ*) - 2τ²rank(Θ̂)

Then bound the cross-term using Hölder's inequality and SVD: (2/n)⟨E,XΔ⟩ ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂) + rank(Θ*))

This combines Young's inequality (2ab ≤ a² + b²) with the spectral bound from the SVD and the operator norm hypothesis.

Substituting: (1/n)‖XΔ‖² ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂)+rank(Θ*)) + 2τ²rank(Θ*) - 2τ²rank(Θ̂) (1/(2n))‖XΔ‖² ≤ 4τ²rank(Θ*) (rank(Θ̂) cancels) (1/n)‖XΔ‖² ≤ 8τ²rank(Θ*)

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusNormSq_sub_eq {m p : ℕ} (A B : Matrix (Fin m) (Fin p) ℝ) :

frobeniusNormSq (A - B) = frobeniusNormSq A + frobeniusNormSq B - 2 * ∑ i : Fin m, ∑ j : Fin p, A i j * B i j

The Frobenius norm expansion identity: ‖A - B‖_F² = ‖A‖_F² + ‖B‖_F² - 2·Σᵢⱼ Aᵢⱼ·Bᵢⱼ

Cross-term bound (Steps 3-5 of Rigollet's proof) #

The key analytic step that requires SVD and Schatten norm machinery:

Given Φ with orthonormal columns spanning col(X) and the operator norm bound ‖ΦᵀE‖_op² ≤ nτ² (from Lemma 4.2 applied to ΦᵀE), the cross-term satisfies:

(2/n)⟨E, XΔ⟩ ≤ (1/(2n))‖XΔ‖_F² + 2τ²(rank(Θ̂) + rank(Θ*))

This uses:

Write XΔ = ΦN (since col(X) ⊆ col(Φ)), with rank(XΔ) ≤ rank(Θ̂) + rank(Θ*)
⟨E, XΔ/‖XΔ‖_F⟩ = ⟨ΦᵀE, N/‖N‖_F⟩ ≤ ‖ΦᵀE‖_op · ‖N/‖N‖F‖* (Hölder)
Nuclear-Frobenius: ‖N/‖N‖F‖* ≤ √rank(N) (Cauchy-Schwarz on singular values)
Apply Young's inequality: 2ab ≤ a²/ε + εb² with appropriate ε
Use rank(XΔ) ≤ rank(Θ̂) + rank(Θ*) (rank subadditivity for Δ = Θ̂ - Θ*)

This requires SVD decomposition and Schatten/nuclear norm inequalities that are not available in Mathlib. The SVD-dependent combined bound svd_inner_product_rank_bound is proved using holder_trace_rank_bound (axiom) as the single remaining unproved ingredient.

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusInner_transpose_mul {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (E : Matrix (Fin n) (Fin T) ℝ) (A : Matrix (Fin d) (Fin T) ℝ) :

frobeniusInner E (X * A) = frobeniusInner (X.transpose * E) A

Frobenius inner product transpose property. ⟨E, X·A⟩_F = ⟨Xᵀ·E, A⟩_F

Proof: Expand both sides as triple sums and swap summation order.

source

theorem Rigollet.Chapter4.Thm_4_4.rank_sub_le {m p : ℕ} (A B : Matrix (Fin m) (Fin p) ℝ) :

(A - B).rank ≤ A.rank + B.rank

Rank subadditivity. rank(A - B) ≤ rank(A) + rank(B). Proof: The range of (A-B).mulVecLin is contained in A.mulVecLin.range ⊔ (-B).mulVecLin.range.

source

theorem Rigollet.Chapter4.Thm_4_4.rank_mul_sub_le_add {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) :

(X * (Θhat - Θstar)).rank ≤ Θhat.rank + Θstar.rank

Rank bound for multiplication with difference. rank(X · (Θ̂ - Θ*)) ≤ rank(Θ̂) + rank(Θ*)

Proof: rank(AB) ≤ rank(B) and rank(A-B) ≤ rank(A) + rank(B).

Frobenius inner product helper lemmas #

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusInner_smul_right {m p : ℕ} (A B : Matrix (Fin m) (Fin p) ℝ) (c : ℝ) :

frobeniusInner A (c • B) = c * frobeniusInner A B

Scaling the second argument of the Frobenius inner product.

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusInner_self {m p : ℕ} (A : Matrix (Fin m) (Fin p) ℝ) :

frobeniusInner A A = frobeniusNormSq A

Frobenius inner product with itself equals squared Frobenius norm.

source

theorem Rigollet.Chapter4.Thm_4_4.entries_zero_of_frobeniusNormSq_zero {m p : ℕ} (B : Matrix (Fin m) (Fin p) ℝ) (hB : frobeniusNormSq B = 0) (i : Fin m) (j : Fin p) :

B i j = 0

When frobeniusNormSq B = 0, all entries of B are zero.

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusInner_zero_of_frobeniusNormSq_zero {m p : ℕ} (A B : Matrix (Fin m) (Fin p) ℝ) (hB : frobeniusNormSq B = 0) :

frobeniusInner A B = 0

Frobenius inner product with a zero-norm matrix is zero.

SVD-based unit inner product bound #

The following is the SVD-dependent core of the proof of Theorem 4.4. Given the unit direction U = XΔ/‖XΔ‖_F, the textbook shows:

⟨E, U⟩² ≤ n · τ² · (rank(Θ̂) + rank(Θ*))

The proof (Rigollet, lines 2842-2860) uses three sub-results that are stated without proof in Section 4.1 or are standard linear algebra facts:

SVD decomposition: X = ΦΛΨᵀ yields XΔ = ΦN with Φ orthonormal
Von Neumann trace / Hölder inequality for Schatten norms (Section 4.1): |⟨A, B⟩| ≤ ‖A‖op · ‖B‖* (nuclear norm × operator norm)
Nuclear-Frobenius inequality (Cauchy-Schwarz on singular values): ‖M‖_* / ‖M‖_F ≤ √rank(M)
Operator norm bound via Lemma 4.2: ‖ΦᵀE‖_op ≤ √(nτ²)

The proof factors through three helper lemmas:

holder_trace_rank_bound: the Hölder-Schatten + nuclear-Frobenius combined bound (axiom)
frobeniusNormSq_orth_left: ‖ΦC‖²_F = ‖C‖²_F when ΦᵀΦ = I
rank_le_of_left_inverse_mul: rank(C) ≤ rank(Φ*C) when Φ has a left inverse

source

theorem Rigollet.Chapter4.Thm_4_4.frobeniusNormSq_orth_left {n r T : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ : Φ.transpose * Φ = 1) (C : Matrix (Fin r) (Fin T) ℝ) :

frobeniusNormSq (Φ * C) = frobeniusNormSq C

Frobenius norm is preserved by left-multiplication with orthonormal columns. ‖ΦC‖²_F = ‖C‖²_F when ΦᵀΦ = I.

Proof: ‖ΦC‖²_F = ⟨ΦC, ΦC⟩_F = ⟨Φᵀ(ΦC), C⟩_F = ⟨(ΦᵀΦ)*C, C⟩_F = ⟨C, C⟩_F = ‖C‖²_F

source

theorem Rigollet.Chapter4.Thm_4_4.holder_trace_rank_bound {m p : ℕ} (A B : Matrix (Fin m) (Fin p) ℝ) :

frobeniusInner A B ^ 2 ≤ matrixOpNorm A ^ 2 * ↑B.rank * frobeniusNormSq B

Von Neumann trace inequality combined with Cauchy-Schwarz on singular values: |⟨A,B⟩_F|² ≤ ‖A‖²_op · rank(B) · ‖B‖²_F. The textbook states this without proof in Section 4.1.1 (Schatten norms and Hölder's inequality for traces). References: Horn & Johnson, Matrix Analysis, Thm 7.4.1.1; Bhatia, Matrix Analysis, IV.2.5.

source

theorem Rigollet.Chapter4.Thm_4_4.rank_le_of_left_inverse_mul {n r T : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ : Φ.transpose * Φ = 1) (C : Matrix (Fin r) (Fin T) ℝ) :

C.rank ≤ (Φ * C).rank

Rank is monotone under left-multiplication by a matrix with left inverse. rank(C) ≤ rank(ΦC) when ΦᵀΦ = I.

Proof: C = Φᵀ*(ΦC), so rank(C) = rank(Φᵀ(ΦC)) ≤ rank(ΦC).

source

theorem Rigollet.Chapter4.Thm_4_4.svd_inner_product_rank_bound {n d T r : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ℝ) (Δ : Matrix (Fin d) (Fin T) ℝ) (τ : ℝ) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) (hXΔ_pos : 0 < frobeniusNormSq (X * Δ)) :

frobeniusInner E ((1 / √(frobeniusNormSq (X * Δ))) • (X * Δ)) ^ 2 ≤ ↑n * τ ^ 2 * ↑(X * Δ).rank

SVD-dependent inner product bound.

For U = XΔ/‖XΔ‖_F the unit-Frobenius-norm direction of M = XΔ: ⟨E, U⟩² ≤ n · τ² · rank(M)

Textbook proof (Rigollet, lines 2842-2860): Write M = ΦN (Φ has orthonormal columns spanning col(X)). Then ⟨E, U⟩ = ⟨ΦᵀE, N/‖N‖_F⟩. By Hölder (Section 4.1, stated without proof): |⟨ΦᵀE, N/‖N‖_F⟩| ≤ ‖ΦᵀE‖_op · ‖N/‖N‖F‖. Nuclear-Frobenius (Cauchy-Schwarz on singular values, not proved in book): ‖N/‖N‖F‖ ≤ √rank(N). And ‖ΦᵀE‖_op² ≤ nτ² (Lemma 4.2 + SVD of X).

Proof structure: Factor XΔ = ΦC through the column space basis Φ, use orthogonal invariance of the Frobenius norm, the Hölder trace-rank bound (holder_trace_rank_bound, axiom), and rank monotonicity.

source

theorem Rigollet.Chapter4.Thm_4_4.unit_inner_product_svd_bound {n d T r : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) (τ : ℝ) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) (hXΔ_pos : 0 < frobeniusNormSq (X * Θhat - X * Θstar)) :

frobeniusInner E ((1 / √(frobeniusNormSq (X * Θhat - X * Θstar))) • (X * Θhat - X * Θstar)) ^ 2 ≤ ↑n * τ ^ 2 * (↑Θhat.rank + ↑Θstar.rank)

SVD-dependent unit inner product bound.

When U is the unit-Frobenius-norm direction of XΔ = X(Θ̂-Θ*), the squared Frobenius inner product ⟨E, U⟩² is bounded by n·τ²·(rank(Θ̂) + rank(Θ*)).

Proof: Combines the SVD inner product bound (svd_inner_product_rank_bound, which uses holder_trace_rank_bound as the only axiom) giving ⟨E, U⟩² ≤ n·τ²·rank(XΔ), with the rank subadditivity result rank(X(Θ̂-Θ*)) ≤ rank(Θ̂) + rank(Θ*).

source

theorem Rigollet.Chapter4.Thm_4_4.crossterm_svd_bound {n d T r : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) (τ : ℝ) (hn : 0 < ↑n) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) :

2 / ↑n * frobeniusInner E (X * Θhat - X * Θstar) ≤ 1 / (2 * ↑n) * frobeniusNormSq (X * Θhat - X * Θstar) + 2 * τ ^ 2 * (↑Θhat.rank + ↑Θstar.rank)

Cross-term bound via Hölder and SVD (core spectral inequality).

This is the key step in the proof of Theorem 4.4 that combines:

Young's inequality (algebraic, proved here)
SVD-based unit inner product bound (via unit_inner_product_svd_bound)

The proof splits into two cases:

If ‖XΔ‖_F = 0, the cross-term is 0 and the bound holds trivially.
If ‖XΔ‖_F > 0, we decompose ⟨E, XΔ⟩ = ⟨E, U⟩·‖XΔ‖_F where U = XΔ/‖XΔ‖_F, apply Young's inequality, and use the SVD bound on ⟨E, U⟩².

source

theorem Rigollet.Chapter4.Thm_4_4.crossTermBound_from_opNorm {n d T r : ℕ} (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (E : Matrix (Fin n) (Fin T) ℝ) (Θhat Θstar : Matrix (Fin d) (Fin T) ℝ) (τ : ℝ) (_hτ_pos : 0 < τ) (hn : 0 < ↑n) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) :

2 / ↑n * frobeniusInner E (X * Θhat - X * Θstar) ≤ 1 / (2 * ↑n) * frobeniusNormSq (X * Θhat - X * Θstar) + 2 * τ ^ 2 * (↑Θhat.rank + ↑Θstar.rank)

Cross-term bound (Steps 3-5 of Rigollet's proof).

Given Φ with orthonormal columns spanning col(X) and ‖ΦᵀE‖_op² ≤ nτ², the cross-term in the Frobenius expansion satisfies:

(2/n)⟨E, XΔ⟩_F ≤ (1/(2n))‖XΔ‖_F² + 2τ²(rank(Θ̂) + rank(Θ*))

where Δ = Θ̂ - Θ*.

Proof: This is a direct consequence of crossterm_svd_bound, which captures the SVD-based spectral argument from the textbook proof.

Step 1: Minimizer inequality after substitution #

source

theorem Rigollet.Chapter4.Thm_4_4.minimizer_substituted {n d T : ℕ} (X : Matrix (Fin n) (Fin d) ℝ) (Θstar : Matrix (Fin d) (Fin T) ℝ) (E Y : Matrix (Fin n) (Fin T) ℝ) (hY : Y = X * Θstar + E) (τ : ℝ) (Θhat : Matrix (Fin d) (Fin T) ℝ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) :

1 / ↑n * frobeniusNormSq (E - (X * Θhat - X * Θstar)) + 2 * τ ^ 2 * ↑Θhat.rank ≤ 1 / ↑n * frobeniusNormSq E + 2 * τ ^ 2 * ↑Θstar.rank

The minimizer inequality expressed in terms of E and Δ: After substituting Y = XΘ* + E: (1/n)‖E - X(Θ̂-Θ*)‖² + 2τ²rank(Θ̂) ≤ (1/n)‖E‖² + 2τ²rank(Θ*)

source

theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_deterministic {n d T r : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ℝ) (E Y : Matrix (Fin n) (Fin T) ℝ) (hY : Y = X * Θstar + E) (τ : ℝ) (hτ : 0 < τ) (Θhat : Matrix (Fin d) (Fin T) ℝ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) :

predictionMSE X Θhat Θstar ≤ 8 * ↑Θstar.rank * τ ^ 2

Theorem 4.4 (deterministic core inequality).

Consider the multivariate regression model Y = XΘ* + E. If Θ̂^RK minimizes (1/n)‖Y - XΘ‖_F² + 2τ²·rank(Θ) and Φ has orthonormal columns spanning col(X) with ‖ΦᵀE‖_op² ≤ nτ², then:

(1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 8 · rank(Θ*) · τ²

The proof uses:

The minimizer property (comparing Θ̂ to Θ*)
Frobenius norm expansion ‖E - XΔ‖² = ‖E‖² + ‖XΔ‖² - 2⟨E,XΔ⟩
The cross-term bound (proved via Hölder-nuclear-Frobenius helper): (2/n)⟨E,XΔ⟩ ≤ (1/(2n))‖XΔ‖² + 2τ²(rank(Θ̂) + rank(Θ*))
Algebraic cancellation of rank(Θ̂) terms

source

theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_probabilistic {n d T r : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ℝ) (E Y : Ω → Matrix (Fin n) (Fin T) ℝ) (hY : ∀ (ω : Ω), Y ω = X * Θstar + E ω) {σsq : ℝ} (_hσ : 0 < σsq) (_hE : IsSubGaussianMatrix E σsq μ) (τ : ℝ) (hτ : 0 < τ) (Θhat : Ω → Matrix (Fin d) (Fin T) ℝ) (hRK : ∀ (ω : Ω), IsRankPenalizationEstimator (Y ω) X τ (Θhat ω)) (ω : Ω) :

matrixOpNorm (Φ.transpose * E ω) ^ 2 ≤ ↑n * τ ^ 2 → predictionMSE X (Θhat ω) Θstar ≤ 8 * ↑Θstar.rank * τ ^ 2

Theorem 4.4 (probabilistic version).

Under the multivariate regression model Y(ω) = XΘ* + E(ω) where E ~ subG_{n×T}(σ²), the rank penalization estimator satisfies:

(1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 8 · rank(Θ*) · τ²

on the event where ‖ΦᵀE‖_op² ≤ nτ² (which by Lemma 4.2 + SVD of X has probability at least 1 - δ when τ is chosen appropriately).

This gives the rate ≲ σ²·rank(Θ*)·(d ∨ T + log(1/δ))/n with probability 1-δ.

Rate bound: connecting 8·rank(Θ)·τ² to σ²·rank(Θ)·(d∨T+log(1/δ))/n #

The textbook states that the bound 8·rank(Θ*)·τ² with τ chosen from Lemma 4.2 yields the rate ≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n.

When the "score matrix" (1/n)XᵀE is sub-Gaussian with variance proxy σsq/n (which follows from the regression model when X has orthonormal rows), Lemma 4.2 gives: τ₀ = 4√(σsq/n)·√(log(12)·(d∨T)) + 2√(σsq/n)·√(2·log(1/δ))

Then τ₀² ≤ 2·(16·(σsq/n)·log(12)·(d∨T)) + 2·(4·(σsq/n)·2·log(1/δ)) = 32·log(12)·(σsq/n)·(d∨T) + 16·(σsq/n)·log(1/δ) ≤ 32·log(12)·(σsq/n)·(d∨T + log(1/δ))

So 8·rank(Θ*)·τ₀² ≤ 256·log(12)·σsq·rank(Θ*)·(d∨T + log(1/δ))/n.

The constant 256·log(12) ≈ 636 can be improved; the "≲" notation hides it.

source

theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_rate_bound (σ₀ : ℝ) (hσ₀ : 0 ≤ σ₀) (D : ℝ) (hD : 0 ≤ D) (L : ℝ) (hL : 0 ≤ L) (r : ℝ) (hr : 0 ≤ r) :

have τ₀ := 4 * √σ₀ * √(Real.log 12 * D) + 2 * √σ₀ * √(2 * L); 8 * r * τ₀ ^ 2 ≤ 256 * Real.log 12 * σ₀ * r * (D + L)

Rate bound for Theorem 4.4.

When τ = 4√σ₀·√(log(12)·D) + 2√σ₀·√(2·L) where σ₀ = σsq/n is the per-sample variance proxy and D = d∨T, L = log(1/δ), then:

8·rank(Θ*)·τ² ≤ 256·log(12) · σsq/n · rank(Θ*) · (D + L)

This uses (a+b)² ≤ 2(a²+b²) and elementary arithmetic. The bound captures the "≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n" rate.

source

theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_full_rate {n d T r : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ℝ) (E Y : Matrix (Fin n) (Fin T) ℝ) (hY : Y = X * Θstar + E) (σ₀ : ℝ) (hσ₀ : 0 ≤ σ₀) (D : ℝ) (hD : 0 ≤ D) (L : ℝ) (hL : 0 ≤ L) (τ : ℝ) (hτ : 0 < τ) (hτ_def : τ = 4 * √σ₀ * √(Real.log 12 * D) + 2 * √σ₀ * √(2 * L)) (Θhat : Matrix (Fin d) (Fin T) ℝ) (hRK : IsRankPenalizationEstimator Y X τ Θhat) (hΦE_bound : matrixOpNorm (Φ.transpose * E) ^ 2 ≤ ↑n * τ ^ 2) :

predictionMSE X Θhat Θstar ≤ 256 * Real.log 12 * σ₀ * ↑Θstar.rank * (D + L)

Theorem 4.4 (full probabilistic version with rate).

Under the multivariate regression model Y(ω) = XΘ* + E(ω) where E ~ subG_{n×T}(σ²), the rank penalization estimator with τ = 4√(σsq/n)·√(log(12)·(d∨T)) + 2√(σsq/n)·√(2·log(1/δ)) satisfies:

(1/n)‖XΘ̂^RK - XΘ*‖_F² ≤ 256·log(12) · (σsq/n) · rank(Θ*) · (d∨T + log(1/δ))

on the event where ‖ΦᵀE‖_op² ≤ nτ², which has probability at least 1 - δ by Lemma 4.2 + SVD of X.

This formalizes the textbook's "≲ σ²·rank(Θ*)·(d∨T + log(1/δ))/n" with the explicit constant 256·log(12).

source

theorem Rigollet.Chapter4.Thm_4_4.theorem_4_4_with_probability {n d T r : ℕ} (hn : 0 < ↑n) (Φ : Matrix (Fin n) (Fin r) ℝ) (hΦ_orth : Φ.transpose * Φ = 1) {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [hprob : MeasureTheory.IsProbabilityMeasure μ] (X : Matrix (Fin n) (Fin d) ℝ) (hX_col : ∃ (M : Matrix (Fin r) (Fin d) ℝ), X = Φ * M) (Θstar : Matrix (Fin d) (Fin T) ℝ) (E Y : Ω → Matrix (Fin n) (Fin T) ℝ) (hY : ∀ (ω : Ω), Y ω = X * Θstar + E ω) (σsq : ℝ) (hσ : 0 < σsq) (hE : IsSubGaussianMatrix (fun (ω : Ω) => Φ.transpose * E ω) σsq μ) (hr : 0 < r) (hT : 0 < T) {δ : ℝ} (hδ_pos : 0 < δ) (hδ_lt : δ < 1) (τ : ℝ) (hτ_def : τ = 4 * √σsq * √(Real.log 12 * ↑(max r T)) + 2 * √σsq * √(2 * Real.log (1 / δ))) (hτ : 0 < τ) (Θhat : Ω → Matrix (Fin d) (Fin T) ℝ) (hRK : ∀ (ω : Ω), IsRankPenalizationEstimator (Y ω) X τ (Θhat ω)) :

μ {ω : Ω | predictionMSE X (Θhat ω) Θstar > 8 * ↑Θstar.rank * τ ^ 2} ≤ ENNReal.ofReal δ

Theorem 4.4 (full probabilistic statement with measure bound).

Under the multivariate regression model Y(ω) = XΘ* + E(ω) where ΦᵀE ~ subG_{r×T}(σ²) (the projected noise matrix is sub-Gaussian), the rank penalization estimator with τ = 4√σ²·√(log(12)·(r∨T)) + 2√σ²·√(2·log(1/δ)) satisfies:

Pμ { ω | predictionMSE X (Θ̂(ω)) Θ* > 8 · rank(Θ*) · τ² } ≤ δ

This is the measure-theoretic formalization of the textbook's "with probability at least 1 - δ" statement of Theorem 4.4.

Proof structure:

The deterministic theorem gives: on {ω | ‖ΦᵀE(ω)‖_op² ≤ nτ²}, the MSE bound holds.
The bad MSE event is contained in {ω | ‖ΦᵀE(ω)‖_op > τ} (since ‖ΦᵀE‖_op ≤ τ ⇒ ‖ΦᵀE‖_op² ≤ τ² ≤ nτ² for n≥1).
By lemma_4_2_operator_norm_high_prob, this event has probability ≤ δ.

Documentation