Local definitions (inlined to avoid cross-file import chains) #
The empirical covariance matrix Σ̂ = (1/n) Σᵢ XᵢXᵢᵀ. Given n i.i.d. samples Xᵢ : Ω → ℝ^d (indexed by Fin n), this returns the d × d matrix (1/n) Σᵢ Xᵢ Xᵢᵀ at each ω.
Instances For
IsLargestEigenvector M w means w is a unit eigenvector of M
corresponding to the largest eigenvalue: it has unit norm and maximizes
the Rayleigh quotient wᵀ M w over all unit vectors.
Instances For
A largest eigenvector has unit norm.
A random vector X : Ω → ℝ^d is sub-Gaussian with variance proxy σ² if for every unit vector u ∈ S^{d-1}, the scalar projection u⊤X is sub-Gaussian with variance proxy σ². Formally, this means the MGF of u⊤X is bounded: E[exp(s·u⊤X)] ≤ exp(σ²s²/2) for all s ∈ ℝ.
Instances For
The rate function appearing in the bound: max(√(t/n), t/n) where t = d + log(1/δ). This captures both the "slow" regime (t/n dominates) and the "fast" regime (√(t/n) dominates).
Instances For
Theorem 4.6 (operator norm concentration) #
The proof of Theorem 4.6 follows the book (Rigollet, §4.3):
- WLOG reduction to Σ = I_d via whitening Y = Σ^{-1/2}X
- ε-net argument (Lemma 4.2): ‖Σ̂_Y - I‖op ≤ 2 max{x,y∈N} |xᵀ(Σ̂_Y - I)y|
- Polarization: (xᵀY)(yᵀY) = ((x+y)ᵀY)²/4 - ((x-y)ᵀY)²/4
- Sub-exponential lifting (Lemma 1.12): products of sub-Gaussians → sub-exponential
- Bernstein's inequality (Theorem 1.13): concentration for sub-exponential averages
- Union bound over ≤ 144^d ε-net pairs → equation (4.7): P(‖Σ̂_Y - I‖_op > t) ≤ 144^d · exp[-n/2 · min((t/32)², t/32)]
- Setting t = C · covRate d n δ to make failure probability ≤ δ.
The proof is structured in three layers:
Axiomatized sub-lemmas for the deepest sub-steps that reference Chapter 1 results and require Mathlib infrastructure not currently available (matrix square roots, ε-net covering numbers, sub-exponential Bernstein):
eps_net_opnorm_reduction_L2: Lemmas 1.18 + 4.2per_pair_concentration_L2: Lemma 1.12 + Theorem 1.13whitened_vectors_exist_L2: Matrix square root whitening
Proved intermediate theorems built from the axiomatized sub-lemmas:
eq_4_7_identity_L2: equation (4.7) via union boundwhitening_reduction_L2: general → identity case reduction
Fully proved assembly (combining whitening with eq 4.7, then algebraic rate simplification).
Universal constant from Theorem 4.6 (operator norm concentration).
Instances For
The constant from Theorem 4.6 is positive.
Helper lemmas for sub-steps of Theorem 4.6 #
These theorems capture the intermediate results whose proofs are given in the book (Rigollet §4.3, using Lemmas 1.12, 1.18, 4.2, Theorem 1.13) but require Mathlib infrastructure not currently available:
- ε-net covering numbers for the L2 unit sphere (Lemma 1.18)
- ε-net reduction for operator norm (Lemma 4.2)
- Sub-exponential moment bounds for products of sub-Gaussians (Lemma 1.12)
- Bernstein's inequality for sub-exponential averages (Theorem 1.13)
- Integration of matrix-transformed random vectors for the whitening step
Note: While Mathlib now has CFC.sqrt for matrix square roots (via the
continuous functional calculus), the whitening step additionally requires
showing that Σ^{-1/2}X preserves sub-Gaussian properties and transforms
the covariance to identity, which involves Bochner integration of
matrix-transformed random vectors — infrastructure not yet available.
These are axiomatized because the sub-steps require the above infrastructure.
Sub-lemmas for the proof of equation (4.7) #
These sub-lemmas correspond to the external references used in the proof of Theorem 4.6 in the book:
eps_net_opnorm_reduction_L2: Lemmas 1.18 + 4.2 (ε-net covering + reduction) — PROVEDper_pair_concentration_L2: Lemma 1.12 + Theorem 1.13 (sub-exponential Bernstein) — proved from helper axiombernstein_bilinear_concentrationwhitened_vectors_exist_L2: Whitening Y = Σ^{-1/2}X — proved from helper axiommatrix_square_root_whitening
The helper axioms correspond to specific external references in the book's proof:
bernstein_bilinear_concentration(axiom) captures the application of Lemma 1.12 + Theorem 1.13 from Chapter 1 (formalized inRigollet.Chapter1.Lemma_1_12andRigollet.Chapter1.Thm_1_13) but requires type bridging infrastructurematrix_square_root_whiteningcaptures the spectral theorem application for constructing Σ^{-1/2} (standard linear algebra, not fully available in Mathlib)
ε-net reduction for L2/spectral operator norm (Lemmas 1.18 + 4.2). There exists a 1/4-net N for S^{d-1} with |N| ≤ 12^d such that ‖A‖_op > t implies some pair x,y ∈ N has |xᵀAy| > t/2.
Uses matrixOpNorm = ‖(toEuclideanLin A).toContinuousLinearMap‖
(the spectral/L2 operator norm) and works over EuclideanSpace ℝ (Fin d).
Proved using lemma_1_18_covering_number_euclidean_ball (Lemma 1.18)
and lemma_4_2_eps_net_reduction (Lemma 4.2).
Type bridge: EuclideanSpace ℝ (Fin d) ↔ Fin d → ℝ #
The definitions in `Thm_4_6` use `Fin d → ℝ` while those in `Cor49` use
`EuclideanSpace ℝ (Fin d) = WithLp 2 (Fin d → ℝ)`. The coordinate
access `Y i ω j` is the same in both worlds, so the `empiricalCovariance`
and `IsSubGaussianVec` definitions agree definitionally.
The Cor49 and Thm_4_6 empirical covariances agree (definitionally).
Bridge between Cor49 and Thm_4_6 sub-Gaussian definitions.
After aligning both definitions to use EuclideanSpace ℝ (Fin d) for the
test vector, the only remaining difference is the random vector type:
Cor49 uses Ω → EuclideanSpace ℝ (Fin d), Thm_4_6 uses Ω → Fin d → ℝ.
The MGF bounds are identical (both use ∑ j, u j * X ω j), so the bridge
reduces to showing that AEMeasurable transfers through WithLp.equiv.
Sub-Gaussian random vectors are AEStronglyMeasurable. This is a standard regularity assumption on random vectors; the book assumes random vectors are measurable without explicit proof.
Independence (iIndepFun) implies pairwise uncorrelation of coordinates. This is a standard mathematical fact: if X_i are mutually independent random vectors, then for i ≠ j, E[X_i(a) * X_j(b)] = E[X_i(a)] * E[X_j(b)]. Requires AEStronglyMeasurable random vectors (standard for measurable random variables).
Bernstein concentration for bilinear form x^T(Σ̂-I)y.
For iid 1-sub-Gaussian vectors Y₁,...,Yₙ with identity covariance, and unit vectors x,y: P(|x^T(Σ̂-I)y| > s) ≤ 2·exp(-n/2 · min((s/16)², s/16)).
Proved by bridging types to Thm_4_6.per_pair_concentration.
Per-pair bilinear form concentration (Polarization + Lemma 1.12 + Theorem 1.13). For the empirical covariance with identity population covariance, the bilinear form xᵀ(Σ̂-I)y concentrates via sub-exponential Bernstein.
Proved from bernstein_bilinear_concentration.
Apply a matrix M to an EuclideanSpace vector, returning an EuclideanSpace vector.
Instances For
Component-wise expansion of applyMatrix: the j-th component of M·x is ∑ i, M j i * x i.
Sub-axioms for the whitening construction #
The book (Rigollet §4.3) constructs Y = Σ^{-1/2}X using the spectral decomposition Σ = UDUᵀ, giving Σ^{-1/2} = UD^{-1/2}Uᵀ. The whitening reduction requires:
- Sub-Gaussian transfer: Linear maps preserve sub-Gaussian properties with operator norm scaling (MX ~ subG_d(‖M‖²·σ²))
- Sub-Gaussian integrability: Sub-Gaussian vectors have integrable products
- Bochner integration of bilinear forms: ∫(Mf)(Mf)ᵀ = M(∫ffᵀ)Mᵀ
- Operator norm sub-multiplicativity: ‖MAMᵀ‖ ≤ ‖M‖²·‖A‖ (proved)
- Empirical covariance transforms under linear maps: Σ̂_Y = MΣ̂_XMᵀ (proved)
The overall whitening reduction (matrix_square_root_whitening) is axiomatized
because the book references the spectral theorem without providing a proof.
Items 1-3 are also axiomatized; items 4-5 are proved from Mathlib.
NOTE: The original whitening_matrix_exists axiom (existence of Σ^{-1/2} with
specific norm bounds) was mathematically false — ‖Σ^{-1/2}‖²·‖Σ‖ = κ(Σ) ≥ 1
for PD matrices, violating the stated bound ‖M‖²·‖Σ‖ ≤ 1. It has been replaced
by the correctly-stated matrix_square_root_whitening axiom.
The inner product ⟪θ, M x⟩ equals ⟪Mᵀ θ, x⟩ in EuclideanSpace.
applyMatrix M x = (toEuclideanLin M) x, linking our definition to Mathlib.
matrixOpNorm M = ‖M‖ under the L2 operator norm.
The norm of applyMatrix Mᵀ θ is at most matrixOpNorm M * ‖θ‖.
The sum ∑ u_j * (MX)_j equals ∑ (Mᵀu)_j * X_j, proved via the inner product identity.
Sub-Gaussian preservation under linear maps (reference: standard, §1).
If X subG_d(σ²) and M is a d×d matrix, then MX subG_d(τ²) whenever
‖M‖²·σ² ≤ τ². The book uses this with M = Σ^{-1/2} to get Y ~ subG_d(1).
Sub-Gaussian components are in L² (reference: sub-Gaussian moment bounds).
For a sub-Gaussian random vector X with variance proxy σ², each component X_a
is square-integrable (MemLp 2). This follows from the MGF bound:
projecting onto the standard basis vector e_a gives E[exp(s·X_a)] ≤ exp(σ²s²/2),
from which E[X_a²] < ∞ follows by standard moment bound arguments
(e.g., x² ≤ exp(x) + exp(-x) combined with the MGF bound at s = ±1).
The measurability and integrability hypotheses make explicit the implicit
assumption that X is a random vector (i.e., measurable) with finite MGF,
which the book takes for granted.
Sub-Gaussian integrability (proved from sub-Gaussian moment bounds).
Sub-Gaussian random vectors have all moments finite, so component products
are integrable. This follows from the L² membership of each component
(via subgaussian_component_memLp_two) and Hölder's inequality
(L² × L² → L¹, i.e., MemLp.integrable_mul).
The measurability and integrability hypotheses make explicit assumptions
that the book takes for granted (X is a random vector with finite MGF).
Bochner integral of bilinear forms under linear maps (proved from Bochner integration). If E[X_j·X_k] = Σ_{jk} and the component products are integrable, then for Y = MX, E[Y_j·Y_k] = (MΣMᵀ)_{jk}. Captures the identity ∫(Mf)(Mf)ᵀdμ = M(∫ffᵀdμ)Mᵀ which requires linearity of Bochner integration for bilinear/quadratic forms.
Per-sample outer product transforms under linear maps: (MX)(MX)ᵀ = M(XXᵀ)Mᵀ.
Empirical covariance transforms under linear maps (reference: algebraic identity). If Y_i = MX_i pointwise, then Σ̂_Y = MΣ̂_XMᵀ. This is the sample-level version of the Bochner identity, holding for each ω.
Sub-Gaussian integrability for covariance estimation (standard mathematical fact).
If X is a sub-Gaussian random vector with variance proxy σ², then for any
coordinates a, b, the product X_a · X_b is integrable. This follows from the fact
that sub-Gaussian random variables have finite moments of all orders: the MGF bound
implies E[X_a²] < ∞ for each coordinate (via subgaussian_component_memLp_two),
and then Cauchy-Schwarz / Hölder gives E[|X_a · X_b|] < ∞
(via MemLp.integrable_mul).
The measurability and exponential integrability hypotheses make explicit
standard assumptions the book takes for granted (random vectors are always
assumed measurable with finite MGF). These are needed because the local
IsSubGaussianVec definition (MGF bound on Bochner integrals) does not
itself imply measurability.
LDL-based whitening matrix helpers #
The inner product on WithLp 2 (Fin d → ℝ) equals dotProduct.
A row of an invertible matrix is nonzero.
Diagonal matrices are symmetric (transpose equals self).
D^{-1/2} * D * D^{-1/2} = I for diagonal D with positive entries.
Spectral theorem: existence of whitening matrix (standard linear algebra background). For any positive definite Σ, there exists M (e.g., Σ^{-1/2}) satisfying: (1) M·Σ·Mᵀ = I (whitening), and (2) Operator norm transfer: if ‖A‖_op > ‖Σ‖_op·t then ‖M·A·Mᵀ‖_op > t.
Property (2) follows from ‖M·A·Mᵀ‖_op ≥ ‖A‖_op / ‖M⁻¹‖²_op = ‖A‖_op / ‖Σ‖_op.
Axiom justification: The textbook does not prove this, instead referencing
standard linear algebra. The hypothesis Sig.PosDef ensures the statement is
mathematically sound (singular matrices have no whitening matrix).
Sub-Gaussian whitening: If X is sub-Gaussian with parameter ‖Σ‖_op and M is a whitening matrix (M·Σ·Mᵀ = I), then MX is sub-Gaussian with parameter 1.
This follows from the MGF characterization: for unit u, u^T(MX) = (M^Tu)^TX, and the quadratic form (M^Tu)^T Σ (M^Tu) = u^T(MΣM^T)u = u^Tu = 1.
Axiom justification: The textbook asserts this as part of the whitening step without explicit proof, referencing the spectral theorem.
Sub-Gaussian exponential integrability: A sub-Gaussian random vector has
integrable exponential coordinates (both positive and negative).
The book always assumes random vectors have finite MGF; this theorem makes
that standard assumption explicit.
Proof: From the IsSubGaussianVec MGF bound, taking the unit vector u = eᵢ
(standard basis) gives an individual MGF bound for each coordinate Xᵢ.
The conclusion then follows from Rigollet.Chapter4.Thm_4_6.exp_individual_integrable
(which axiomatizes the standard fact that a bounded MGF implies integrability
of the exponential).
Sub-Gaussian regularity: A sub-Gaussian random vector has measurable
coordinates and integrable exponentials (both positive and negative).
Part (1) — measurability — follows from subGaussianVec_aestronglyMeasurable
(the vector is AEStronglyMeasurable, so each coordinate is too, by
composition with the continuous projection EuclideanSpace.proj).
Parts (2) and (3) — exponential integrability — are background assumptions
not proved in the textbook, provided by subgaussianVec_exp_integrable.
Sub-Gaussian product integrability (proved from sub-Gaussian tail bounds). For a sub-Gaussian random vector X with variance proxy σ², the product X_a · X_b of any two coordinates is integrable. This follows from:
subgaussian_vec_regularity(measurability + exponential integrability)subgaussian_product_integrable(L² × L² → L¹ via Cauchy-Schwarz) This is the provable part of the whitening matrix properties.
Existence of whitening matrix with all properties (combines spectral theorem
- sub-Gaussian integrability). The spectral theorem parts (1-2) come from
spectral_whitening_axiom(not proved in textbook), while the integrability part (3) is proved fromsubgaussian_vec_product_integrableusing the sub-Gaussian tail bound machinery.
Matrix square root whitening (reference: Spectral theorem, Rigollet §4.3). For any covariance matrix Σ with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), the whitened random vectors Y_i = Σ^{-1/2} X_i satisfy identity covariance, unit sub-Gaussian parameter, and the operator norm transfer property.
Proved from whitening_matrix_axiom (which combines spectral_whitening_axiom
for the spectral theorem parts, and subgaussian_vec_product_integrable for
the integrability part) together with subgaussian_whitening_axiom,
bochner_bilinear_transform, and empirical_covariance_linear_map.
Whitened vectors existence (Y = Σ^{-1/2}X). For X with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), there exist whitened vectors Y satisfying identity covariance, unit sub-Gaussian parameter, and the operator norm implication.
Proved from matrix_square_root_whitening, which captures the spectral
theorem application referenced in the book's proof of Theorem 4.6.
Helper lemmas for the union bound argument #
Equation (4.7) (Identity-case concentration for L2/spectral norm).
For i.i.d. Y₁,...,Yₙ with E[YYᵀ] = I and Y ~ subG_d(1), P(‖Σ̂_Y - I‖_op > t) ≤ 288^d · exp[-n/2 · min((t/32)², t/32)]
Proved in the book via:
- ε-net N for S^{d-1} with |N| ≤ 12^d (Lemma 1.18)
- ε-net reduction (Lemma 4.2): ‖A‖op ≤ 2·max{x,y∈N} |xᵀAy|
- Polarization + sub-exponential lifting (Lemma 1.12)
- Bernstein's inequality (Theorem 1.13) for each pair
- Union bound over |N|² ≤ 144^d pairs, with factor 2 from Bernstein tail
Whitening reduction for L2/spectral norm.
For X with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), the whitened vectors Y = Σ^{-1/2}X satisfy:
- E[YYᵀ] = I (identity covariance)
- Y ~ subG_d(1) (unit sub-Gaussian parameter)
- ‖Σ̂_X − Σ‖_op > ‖Σ‖_op·t ⟹ ‖Σ̂_Y − I‖_op > t (from Σ̂_X − Σ = Σ^{1/2}(Σ̂_Y − I)Σ^{1/2} and sub-multiplicativity)
Proved from the axiomatized whitened_vectors_exist_L2.
Assembly: Whitening + eq (4.7) → general concentration #
General concentration via whitening + identity-case bound. Combines the whitening reduction with equation (4.7).
Algebraic rate bound #
Setting t = C · covRate d n δ in the exponential tail bound from equation (4.7), we verify that 288^d · exp(-n/2 · min((t/32)², t/32)) ≤ δ for C = 1024.
Theorem 4.6 (Operator norm concentration for i.i.d. sub-Gaussian random vectors).
With probability ≥ 1 − δ, ‖Σ̂ − Σ‖_op ≤ C₁ · ‖Σ‖_op · covRate(d,n,δ).
The proof follows the book (Rigollet, §4.3):
- Whitening reduction Y = Σ^{-1/2}X reduces to identity covariance
- Equation (4.7) gives the exponential tail bound via ε-net + Bernstein
- Algebraic simplification with C₁ = 1024 absorbs the constants
The sub-steps (whitening, per-pair Bernstein) are axiomatized because they require Mathlib infrastructure not currently available (matrix square roots, ε-net covering for L2, sub-exponential Bernstein). The assembly is fully proved.
From Theorem 4.8 (Davis-Kahan): eigenvector error bound (pointwise). For the spiked covariance model Σ = θvv⊤ + I_d, if v̂ is a largest eigenvector of any PSD matrix Σ̃, then min_{ε∈{±1}} ‖εv̂ − v‖ ≤ (2√2/θ) · ‖Σ̃ − Σ‖_op.
Derived from Thm48.theorem_4_8 which gives the squared form:
min(‖v̂-v‖², ‖v̂+v‖²) ≤ (8/θ²)‖Σ̃ − Σ‖_op².
The empirical covariance Σ̂ = (1/n)ΣᵢXᵢXᵢᵀ is always PSD. Each summand XᵢXᵢᵀ = vecMulVec xᵢ (star xᵢ) is PSD, and a nonneg-scaled sum of PSD matrices is PSD.
Helper lemmas for the spiked operator norm computation #
Under the spiked covariance model Σ = θvv⊤ + I_d, the operator norm is ‖Σ‖_op = 1 + θ. Proof: Upper bound via triangle inequality + Cauchy-Schwarz; lower bound by evaluating at the spike vector v.
Helper: complement probability bound #
For a probability measure, if the complement has measure ≤ ofReal δ,
then the set has measure ≥ ofReal (1 - δ).
This version does NOT require measurability of S, using measure_union_le
instead of measure_add_measure_compl.
Corollary 4.9 (PCA consistency via empirical covariance) #
The corollary combines Theorem 4.6 and Theorem 4.8 as follows:
Theorem 4.6 bounds the operator norm of the estimation error: ‖Σ̂ − Σ‖_op ≤ C₁ · ‖Σ‖_op · rate(d,n,δ) w.p. ≥ 1 − δ.
Theorem 4.8 (Davis-Kahan) converts operator norm error to eigenvector error: min_{ε ∈ {±1}} ‖ε v̂ − v‖₂ ≤ (2√2/θ) · ‖Σ̂ − Σ‖_op.
Under the spiked model Σ = θvv⊤ + I_d, the operator norm is ‖Σ‖op = 1 + θ. Substituting: min{ε ∈ {±1}} ‖ε v̂ − v‖₂ ≤ (2√2·C₁/θ) · (1+θ) · rate(d,n,δ) = C · (1+θ)/θ · rate(d,n,δ)
Corollary 4.9 (PCA consistency via empirical covariance).
Let X₁,...,Xₙ be n i.i.d. copies of a sub-Gaussian random vector X ∈ ℝ^d with E[XX⊤] = Σ and X ~ subG_d(‖Σ‖_op). Assume Σ = θvv⊤ + I_d (spiked covariance model). Then the largest eigenvector v̂ of the empirical covariance matrix Σ̂ satisfies:
min_{ε∈{±1}} |εv̂ − v|₂ ≲ ((1+θ)/θ) · (√((d+log(1/δ))/n) ∨ (d+log(1/δ))/n)
with probability 1 − δ.
The i.i.d. assumption is encoded through:
hcov: identical distribution (same covariance for all i)hSubG: identical sub-Gaussian parameter for all ihIndep: mutual independence of the random vectors