Documentation

Atlas.HighDimensionalStatistics.code.Chapter4.Cor_4_9

Local definitions (inlined to avoid cross-file import chains) #

def Cor49.IsSpikedCovariance {d : } (S : Matrix (Fin d) (Fin d) ) (θ : ) (v : EuclideanSpace (Fin d)) :

Definition 4.7 (Spiked covariance model). A matrix S satisfies the spiked covariance model with parameter θ > 0 and spike v (a unit vector) if S = θ • (v · vᵀ) + I_d.

Instances For
    noncomputable def Cor49.matrixOpNorm {d : } (A : Matrix (Fin d) (Fin d) ) :

    The operator norm of a matrix: ‖A‖_op = sup_{‖x‖₂=1} ‖Ax‖₂, defined via the continuous linear map norm on Euclidean spaces (L2/spectral norm).

    Instances For
      noncomputable def Cor49.empiricalCovariance {d n : } {Ω : Type u_1} (X : Fin nΩEuclideanSpace (Fin d)) (ω : Ω) :
      Matrix (Fin d) (Fin d)

      The empirical covariance matrix Σ̂ = (1/n) Σᵢ XᵢXᵢᵀ. Given n i.i.d. samples Xᵢ : Ω → ℝ^d (indexed by Fin n), this returns the d × d matrix (1/n) Σᵢ Xᵢ Xᵢᵀ at each ω.

      Instances For

        IsLargestEigenvector M w means w is a unit eigenvector of M corresponding to the largest eigenvalue: it has unit norm and maximizes the Rayleigh quotient wᵀ M w over all unit vectors.

        Instances For

          A largest eigenvector has unit norm.

          def Cor49.IsSubGaussianVec {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (X : ΩEuclideanSpace (Fin d)) (σsq : ) :

          A random vector X : Ω → ℝ^d is sub-Gaussian with variance proxy σ² if for every unit vector u ∈ S^{d-1}, the scalar projection u⊤X is sub-Gaussian with variance proxy σ². Formally, this means the MGF of u⊤X is bounded: E[exp(s·u⊤X)] ≤ exp(σ²s²/2) for all s ∈ ℝ.

          Instances For
            noncomputable def Cor49.covRate (d₀ n : ) (delta : ) :

            The rate function appearing in the bound: max(√(t/n), t/n) where t = d + log(1/δ). This captures both the "slow" regime (t/n dominates) and the "fast" regime (√(t/n) dominates).

            Instances For

              Theorem 4.6 (operator norm concentration) #

              The proof of Theorem 4.6 follows the book (Rigollet, §4.3):

              1. WLOG reduction to Σ = I_d via whitening Y = Σ^{-1/2}X
              2. ε-net argument (Lemma 4.2): ‖Σ̂_Y - I‖op ≤ 2 max{x,y∈N} |xᵀ(Σ̂_Y - I)y|
              3. Polarization: (xᵀY)(yᵀY) = ((x+y)ᵀY)²/4 - ((x-y)ᵀY)²/4
              4. Sub-exponential lifting (Lemma 1.12): products of sub-Gaussians → sub-exponential
              5. Bernstein's inequality (Theorem 1.13): concentration for sub-exponential averages
              6. Union bound over ≤ 144^d ε-net pairs → equation (4.7): P(‖Σ̂_Y - I‖_op > t) ≤ 144^d · exp[-n/2 · min((t/32)², t/32)]
              7. Setting t = C · covRate d n δ to make failure probability ≤ δ.

              The proof is structured in three layers:

              1. Axiomatized sub-lemmas for the deepest sub-steps that reference Chapter 1 results and require Mathlib infrastructure not currently available (matrix square roots, ε-net covering numbers, sub-exponential Bernstein):

              2. Proved intermediate theorems built from the axiomatized sub-lemmas:

              3. Fully proved assembly (combining whitening with eq 4.7, then algebraic rate simplification).

              Universal constant from Theorem 4.6 (operator norm concentration).

              Instances For

                The constant from Theorem 4.6 is positive.

                Helper lemmas for sub-steps of Theorem 4.6 #

                These theorems capture the intermediate results whose proofs are given in the book (Rigollet §4.3, using Lemmas 1.12, 1.18, 4.2, Theorem 1.13) but require Mathlib infrastructure not currently available:

                Note: While Mathlib now has CFC.sqrt for matrix square roots (via the continuous functional calculus), the whitening step additionally requires showing that Σ^{-1/2}X preserves sub-Gaussian properties and transforms the covariance to identity, which involves Bochner integration of matrix-transformed random vectors — infrastructure not yet available.

                These are axiomatized because the sub-steps require the above infrastructure.

                Sub-lemmas for the proof of equation (4.7) #

                These sub-lemmas correspond to the external references used in the proof of Theorem 4.6 in the book:

                The helper axioms correspond to specific external references in the book's proof:

                theorem Cor49.eps_net_opnorm_reduction_L2 {d : } (hd : 0 < d) :
                ∃ (N : Finset (EuclideanSpace (Fin d))), N.card 12 ^ d N.Nonempty (∀ xN, x 1) ∀ (A : Matrix (Fin d) (Fin d) ) (t : ), matrixOpNorm A > txN, yN, |(EuclideanSpace.equiv (Fin d) ) x ⬝ᵥ A.mulVec ((EuclideanSpace.equiv (Fin d) ) y)| > t / 2

                ε-net reduction for L2/spectral operator norm (Lemmas 1.18 + 4.2). There exists a 1/4-net N for S^{d-1} with |N| ≤ 12^d such that ‖A‖_op > t implies some pair x,y ∈ N has |xᵀAy| > t/2.

                Uses matrixOpNorm = ‖(toEuclideanLin A).toContinuousLinearMap‖ (the spectral/L2 operator norm) and works over EuclideanSpace ℝ (Fin d).

                Proved using lemma_1_18_covering_number_euclidean_ball (Lemma 1.18) and lemma_4_2_eps_net_reduction (Lemma 4.2).

                Type bridge: EuclideanSpace ℝ (Fin d) ↔ Fin d → ℝ #

                The definitions in `Thm_4_6` use `Fin d → ℝ` while those in `Cor49` use
                `EuclideanSpace ℝ (Fin d) = WithLp 2 (Fin d → ℝ)`.  The coordinate
                access `Y i ω j` is the same in both worlds, so the `empiricalCovariance`
                and `IsSubGaussianVec` definitions agree definitionally. 
                
                def Cor49.toPlainVec {d n : } {Ω : Type u_1} (Y : Fin nΩEuclideanSpace (Fin d)) :
                Fin nΩFin d

                Strip the WithLp wrapper to convert EuclideanSpace vectors to plain vectors.

                Instances For

                  The Cor49 and Thm_4_6 empirical covariances agree (definitionally).

                  theorem Cor49.isSubGaussianVec_bridge {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) (Z : ΩEuclideanSpace (Fin d)) :
                  IsSubGaussianVec μ Z 1Rigollet.Chapter4.Thm_4_6.IsSubGaussianVec μ (fun (ω : Ω) (j : Fin d) => (Z ω).ofLp j) 1

                  Bridge between Cor49 and Thm_4_6 sub-Gaussian definitions. After aligning both definitions to use EuclideanSpace ℝ (Fin d) for the test vector, the only remaining difference is the random vector type: Cor49 uses Ω → EuclideanSpace ℝ (Fin d), Thm_4_6 uses Ω → Fin d → ℝ. The MGF bounds are identical (both use ∑ j, u j * X ω j), so the bridge reduces to showing that AEMeasurable transfers through WithLp.equiv.

                  theorem Cor49.norm_plain_le_euclidean {d : } (x : EuclideanSpace (Fin d)) (hx : x 1) :
                  fun (j : Fin d) => x.ofLp j 1

                  The sup norm of a plain vector is ≤ the L2 norm of the EuclideanSpace vector.

                  Sub-Gaussian random vectors are AEStronglyMeasurable. This is a standard regularity assumption on random vectors; the book assumes random vectors are measurable without explicit proof.

                  theorem Cor49.iIndepFun_to_pairwise_uncorr {d n : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (Y : Fin nΩEuclideanSpace (Fin d)) (hIndep : ProbabilityTheory.iIndepFun Y μ) (hMeas : ∀ (i : Fin n), MeasureTheory.AEStronglyMeasurable (Y i) μ) (i j : Fin n) :
                  i j∀ (a b : Fin d), (ω : Ω), (Y i ω).ofLp a * (Y j ω).ofLp b μ = ( (ω : Ω), (Y i ω).ofLp a μ) * (ω : Ω), (Y j ω).ofLp b μ

                  Independence (iIndepFun) implies pairwise uncorrelation of coordinates. This is a standard mathematical fact: if X_i are mutually independent random vectors, then for i ≠ j, E[X_i(a) * X_j(b)] = E[X_i(a)] * E[X_j(b)]. Requires AEStronglyMeasurable random vectors (standard for measurable random variables).

                  theorem Cor49.bernstein_bilinear_concentration {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hn : 0 < n) (Y : Fin nΩEuclideanSpace (Fin d)) (hcov_id : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (hIndep : ProbabilityTheory.iIndepFun Y μ) (x y : EuclideanSpace (Fin d)) (hx : x 1) (hy : y 1) (s : ) (hs : 0 < s) :
                  μ {ω : Ω | |(EuclideanSpace.equiv (Fin d) ) x ⬝ᵥ (empiricalCovariance Y ω - 1).mulVec ((EuclideanSpace.equiv (Fin d) ) y)| > s} ENNReal.ofReal (2 * Real.exp (-(n / 2 * min ((s / 16) ^ 2) (s / 16))))

                  Bernstein concentration for bilinear form x^T(Σ̂-I)y.

                  For iid 1-sub-Gaussian vectors Y₁,...,Yₙ with identity covariance, and unit vectors x,y: P(|x^T(Σ̂-I)y| > s) ≤ 2·exp(-n/2 · min((s/16)², s/16)).

                  Proved by bridging types to Thm_4_6.per_pair_concentration.

                  theorem Cor49.per_pair_concentration_L2 {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hn : 0 < n) (Y : Fin nΩEuclideanSpace (Fin d)) (hcov_id : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (hIndep : ProbabilityTheory.iIndepFun Y μ) (x y : EuclideanSpace (Fin d)) (hx : x 1) (hy : y 1) (s : ) (hs : 0 < s) :
                  μ {ω : Ω | |(EuclideanSpace.equiv (Fin d) ) x ⬝ᵥ (empiricalCovariance Y ω - 1).mulVec ((EuclideanSpace.equiv (Fin d) ) y)| > s} ENNReal.ofReal (2 * Real.exp (-(n / 2 * min ((s / 16) ^ 2) (s / 16))))

                  Per-pair bilinear form concentration (Polarization + Lemma 1.12 + Theorem 1.13). For the empirical covariance with identity population covariance, the bilinear form xᵀ(Σ̂-I)y concentrates via sub-exponential Bernstein.

                  Proved from bernstein_bilinear_concentration.

                  def Cor49.applyMatrix {d : } (M : Matrix (Fin d) (Fin d) ) (x : EuclideanSpace (Fin d)) :

                  Apply a matrix M to an EuclideanSpace vector, returning an EuclideanSpace vector.

                  Instances For
                    theorem Cor49.applyMatrix_apply {d : } (M : Matrix (Fin d) (Fin d) ) (x : EuclideanSpace (Fin d)) (j : Fin d) :
                    (applyMatrix M x).ofLp j = i : Fin d, M j i * x.ofLp i

                    Component-wise expansion of applyMatrix: the j-th component of M·x is ∑ i, M j i * x i.

                    Sub-axioms for the whitening construction #

                    The book (Rigollet §4.3) constructs Y = Σ^{-1/2}X using the spectral decomposition Σ = UDUᵀ, giving Σ^{-1/2} = UD^{-1/2}Uᵀ. The whitening reduction requires:

                    1. Sub-Gaussian transfer: Linear maps preserve sub-Gaussian properties with operator norm scaling (MX ~ subG_d(‖M‖²·σ²))
                    2. Sub-Gaussian integrability: Sub-Gaussian vectors have integrable products
                    3. Bochner integration of bilinear forms: ∫(Mf)(Mf)ᵀ = M(∫ffᵀ)Mᵀ
                    4. Operator norm sub-multiplicativity: ‖MAMᵀ‖ ≤ ‖M‖²·‖A‖ (proved)
                    5. Empirical covariance transforms under linear maps: Σ̂_Y = MΣ̂_XMᵀ (proved)

                    The overall whitening reduction (matrix_square_root_whitening) is axiomatized because the book references the spectral theorem without providing a proof. Items 1-3 are also axiomatized; items 4-5 are proved from Mathlib.

                    NOTE: The original whitening_matrix_exists axiom (existence of Σ^{-1/2} with specific norm bounds) was mathematically false — ‖Σ^{-1/2}‖²·‖Σ‖ = κ(Σ) ≥ 1 for PD matrices, violating the stated bound ‖M‖²·‖Σ‖ ≤ 1. It has been replaced by the correctly-stated matrix_square_root_whitening axiom.

                    theorem Cor49.sum_eq_inner {d : } (u x : EuclideanSpace (Fin d)) :
                    j : Fin d, u.ofLp j * x.ofLp j = inner u x

                    The component-wise sum ∑ u_j * x_j equals the inner product ⟪u, x⟫ in EuclideanSpace.

                    The inner product ⟪θ, M x⟩ equals ⟪Mᵀ θ, x⟩ in EuclideanSpace.

                    applyMatrix M x = (toEuclideanLin M) x, linking our definition to Mathlib.

                    matrixOpNorm M = ‖M‖ under the L2 operator norm.

                    For real matrices, transpose = conjTranspose.

                    The norm of applyMatrix Mᵀ θ is at most matrixOpNorm M * ‖θ‖.

                    theorem Cor49.sum_applyMatrix_eq {d : } (M : Matrix (Fin d) (Fin d) ) (u x : EuclideanSpace (Fin d)) :
                    j : Fin d, u.ofLp j * (applyMatrix M x).ofLp j = j : Fin d, (applyMatrix M.transpose u).ofLp j * x.ofLp j

                    The sum ∑ u_j * (MX)_j equals ∑ (Mᵀu)_j * X_j, proved via the inner product identity.

                    theorem Cor49.subgaussian_linear_transform {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (M : Matrix (Fin d) (Fin d) ) (σsq τsq : ) :
                    IsSubGaussianVec μ X σsq0 σsqmatrixOpNorm M ^ 2 * σsq τsqIsSubGaussianVec μ (fun (ω : Ω) => applyMatrix M (X ω)) τsq

                    Sub-Gaussian preservation under linear maps (reference: standard, §1). If X subG_d(σ²) and M is a d×d matrix, then MX subG_d(τ²) whenever ‖M‖²·σ² ≤ τ². The book uses this with M = Σ^{-1/2} to get Y ~ subG_d(1).

                    For any real x, x² ≤ exp(x) + exp(-x). This follows from exp(x) + exp(-x) = 2·cosh(x) ≥ 2·(1 + x²/2) = 2 + x².

                    theorem Cor49.subgaussian_component_memLp_two {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (σsq : ) :
                    IsSubGaussianVec μ X σsq∀ (a : Fin d), MeasureTheory.AEStronglyMeasurable (fun (ω : Ω) => (X ω).ofLp a) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp a)) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp a)) μMeasureTheory.MemLp (fun (ω : Ω) => (X ω).ofLp a) 2 μ

                    Sub-Gaussian components are in L² (reference: sub-Gaussian moment bounds). For a sub-Gaussian random vector X with variance proxy σ², each component X_a is square-integrable (MemLp 2). This follows from the MGF bound: projecting onto the standard basis vector e_a gives E[exp(s·X_a)] ≤ exp(σ²s²/2), from which E[X_a²] < ∞ follows by standard moment bound arguments (e.g., x² ≤ exp(x) + exp(-x) combined with the MGF bound at s = ±1). The measurability and integrability hypotheses make explicit the implicit assumption that X is a random vector (i.e., measurable) with finite MGF, which the book takes for granted.

                    theorem Cor49.subgaussian_product_integrable {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (σsq : ) :
                    IsSubGaussianVec μ X σsq∀ (a b : Fin d), MeasureTheory.AEStronglyMeasurable (fun (ω : Ω) => (X ω).ofLp a) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp a)) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp a)) μMeasureTheory.AEStronglyMeasurable (fun (ω : Ω) => (X ω).ofLp b) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp b)) μMeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp b)) μMeasureTheory.Integrable (fun (ω : Ω) => (X ω).ofLp a * (X ω).ofLp b) μ

                    Sub-Gaussian integrability (proved from sub-Gaussian moment bounds). Sub-Gaussian random vectors have all moments finite, so component products are integrable. This follows from the L² membership of each component (via subgaussian_component_memLp_two) and Hölder's inequality (L² × L² → L¹, i.e., MemLp.integrable_mul). The measurability and integrability hypotheses make explicit assumptions that the book takes for granted (X is a random vector with finite MGF).

                    theorem Cor49.bochner_bilinear_transform {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (Sig M : Matrix (Fin d) (Fin d) ) (hInt : ∀ (a b : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => (X ω).ofLp a * (X ω).ofLp b) μ) :
                    (∀ (j k : Fin d), (ω : Ω), (X ω).ofLp j * (X ω).ofLp k μ = Sig j k)∀ (j k : Fin d), (ω : Ω), (applyMatrix M (X ω)).ofLp j * (applyMatrix M (X ω)).ofLp k μ = (M * Sig * M.transpose) j k

                    Bochner integral of bilinear forms under linear maps (proved from Bochner integration). If E[X_j·X_k] = Σ_{jk} and the component products are integrable, then for Y = MX, E[Y_j·Y_k] = (MΣMᵀ)_{jk}. Captures the identity ∫(Mf)(Mf)ᵀdμ = M(∫ffᵀdμ)Mᵀ which requires linearity of Bochner integration for bilinear/quadratic forms.

                    Operator norm sub-multiplicativity under conjugation (reference: standard linear algebra). For any matrices M, A: ‖MAMᵀ‖_op ≤ ‖M‖_op² · ‖A‖_op.

                    theorem Cor49.apply_outer_product {d : } (M : Matrix (Fin d) (Fin d) ) (x : EuclideanSpace (Fin d)) :
                    (Matrix.of fun (j k : Fin d) => (applyMatrix M x).ofLp j * (applyMatrix M x).ofLp k) = (M * Matrix.of fun (j k : Fin d) => x.ofLp j * x.ofLp k) * M.transpose

                    Per-sample outer product transforms under linear maps: (MX)(MX)ᵀ = M(XXᵀ)Mᵀ.

                    theorem Cor49.empirical_covariance_linear_map {d n : } {Ω : Type u_1} (M : Matrix (Fin d) (Fin d) ) (X : Fin nΩEuclideanSpace (Fin d)) (ω : Ω) :
                    empiricalCovariance (fun (i : Fin n) (ω : Ω) => applyMatrix M (X i ω)) ω = M * empiricalCovariance X ω * M.transpose

                    Empirical covariance transforms under linear maps (reference: algebraic identity). If Y_i = MX_i pointwise, then Σ̂_Y = MΣ̂_XMᵀ. This is the sample-level version of the Bochner identity, holding for each ω.

                    theorem Cor49.subG_integrability_covariance {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {d : } (hd : 0 < d) (Sig : Matrix (Fin d) (Fin d) ) (hSig : Sig.PosSemidef) (X : ΩEuclideanSpace (Fin d)) (hSubG : IsSubGaussianVec μ X (matrixOpNorm Sig)) (hXm : ∀ (i : Fin d), MeasureTheory.AEStronglyMeasurable (fun (ω : Ω) => (X ω).ofLp i) μ) (hExpPos : ∀ (i : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp i)) μ) (hExpNeg : ∀ (i : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp i)) μ) (a b : Fin d) :
                    MeasureTheory.Integrable (fun (ω : Ω) => (X ω).ofLp a * (X ω).ofLp b) μ

                    Sub-Gaussian integrability for covariance estimation (standard mathematical fact). If X is a sub-Gaussian random vector with variance proxy σ², then for any coordinates a, b, the product X_a · X_b is integrable. This follows from the fact that sub-Gaussian random variables have finite moments of all orders: the MGF bound implies E[X_a²] < ∞ for each coordinate (via subgaussian_component_memLp_two), and then Cauchy-Schwarz / Hölder gives E[|X_a · X_b|] < ∞ (via MemLp.integrable_mul).

                    The measurability and exponential integrability hypotheses make explicit standard assumptions the book takes for granted (random vectors are always assumed measurable with finite MGF). These are needed because the local IsSubGaussianVec definition (MGF bound on Bochner integrals) does not itself imply measurability.

                    LDL-based whitening matrix helpers #

                    theorem Cor49.piLp_inner_eq_dotProduct {d : } (u v : Fin d) :

                    The inner product on WithLp 2 (Fin d → ℝ) equals dotProduct.

                    theorem Cor49.row_ne_zero_of_invertible {d : } (M : Matrix (Fin d) (Fin d) ) [Invertible M] (i : Fin d) :
                    M i 0

                    A row of an invertible matrix is nonzero.

                    theorem Cor49.ldl_diagEntries_pos {d : } (S : Matrix (Fin d) (Fin d) ) (hS : S.PosDef) (i : Fin d) :

                    The diagonal entries of the LDL decomposition are positive for PD matrices.

                    Diagonal matrices are symmetric (transpose equals self).

                    theorem Cor49.diagSqrtInv_mul_diag {d : } (f : Fin d) (hf : ∀ (i : Fin d), 0 < f i) :
                    ((Matrix.diagonal fun (i : Fin d) => 1 / (f i)) * Matrix.diagonal f * Matrix.diagonal fun (i : Fin d) => 1 / (f i)) = 1

                    D^{-1/2} * D * D^{-1/2} = I for diagonal D with positive entries.

                    theorem Cor49.spectral_whitening_axiom {d : } :
                    0 < d∀ (Sig : Matrix (Fin d) (Fin d) ), Sig.PosDef∃ (M : Matrix (Fin d) (Fin d) ), M * Sig * M.transpose = 1 ∀ (A : Matrix (Fin d) (Fin d) ) (t : ), matrixOpNorm A > matrixOpNorm Sig * tmatrixOpNorm (M * A * M.transpose) > t

                    Spectral theorem: existence of whitening matrix (standard linear algebra background). For any positive definite Σ, there exists M (e.g., Σ^{-1/2}) satisfying: (1) M·Σ·Mᵀ = I (whitening), and (2) Operator norm transfer: if ‖A‖_op > ‖Σ‖_op·t then ‖M·A·Mᵀ‖_op > t.

                    Property (2) follows from ‖M·A·Mᵀ‖_op ≥ ‖A‖_op / ‖M⁻¹‖²_op = ‖A‖_op / ‖Σ‖_op.

                    Axiom justification: The textbook does not prove this, instead referencing standard linear algebra. The hypothesis Sig.PosDef ensures the statement is mathematically sound (singular matrices have no whitening matrix).

                    theorem Cor49.subgaussian_whitening_axiom {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (Sig M : Matrix (Fin d) (Fin d) ) :
                    IsSubGaussianVec μ X (matrixOpNorm Sig)M * Sig * M.transpose = 1IsSubGaussianVec μ (fun (ω : Ω) => applyMatrix M (X ω)) 1

                    Sub-Gaussian whitening: If X is sub-Gaussian with parameter ‖Σ‖_op and M is a whitening matrix (M·Σ·Mᵀ = I), then MX is sub-Gaussian with parameter 1.

                    This follows from the MGF characterization: for unit u, u^T(MX) = (M^Tu)^TX, and the quadratic form (M^Tu)^T Σ (M^Tu) = u^T(MΣM^T)u = u^Tu = 1.

                    Axiom justification: The textbook asserts this as part of the whitening step without explicit proof, referencing the spectral theorem.

                    theorem Cor49.subgaussianVec_exp_integrable {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (σsq : ) (hSubG : IsSubGaussianVec μ X σsq) (i : Fin d) :
                    MeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp i)) μ MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp i)) μ

                    Sub-Gaussian exponential integrability: A sub-Gaussian random vector has integrable exponential coordinates (both positive and negative). The book always assumes random vectors have finite MGF; this theorem makes that standard assumption explicit. Proof: From the IsSubGaussianVec MGF bound, taking the unit vector u = eᵢ (standard basis) gives an individual MGF bound for each coordinate Xᵢ. The conclusion then follows from Rigollet.Chapter4.Thm_4_6.exp_individual_integrable (which axiomatizes the standard fact that a bounded MGF implies integrability of the exponential).

                    theorem Cor49.subgaussian_vec_regularity {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)) (σsq : ) :
                    IsSubGaussianVec μ X σsq(∀ (i : Fin d), MeasureTheory.AEStronglyMeasurable (fun (ω : Ω) => (X ω).ofLp i) μ) (∀ (i : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp ((X ω).ofLp i)) μ) ∀ (i : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (-(X ω).ofLp i)) μ

                    Sub-Gaussian regularity: A sub-Gaussian random vector has measurable coordinates and integrable exponentials (both positive and negative). Part (1) — measurability — follows from subGaussianVec_aestronglyMeasurable (the vector is AEStronglyMeasurable, so each coordinate is too, by composition with the continuous projection EuclideanSpace.proj). Parts (2) and (3) — exponential integrability — are background assumptions not proved in the textbook, provided by subgaussianVec_exp_integrable.

                    theorem Cor49.subgaussian_vec_product_integrable {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] {d : } (X : ΩEuclideanSpace (Fin d)) (σsq : ) (hSubG : IsSubGaussianVec μ X σsq) (a b : Fin d) :
                    MeasureTheory.Integrable (fun (ω : Ω) => (X ω).ofLp a * (X ω).ofLp b) μ

                    Sub-Gaussian product integrability (proved from sub-Gaussian tail bounds). For a sub-Gaussian random vector X with variance proxy σ², the product X_a · X_b of any two coordinates is integrable. This follows from:

                    theorem Cor49.whitening_matrix_axiom {d : } :
                    0 < d∀ (Sig : Matrix (Fin d) (Fin d) ), Sig.PosDef∃ (M : Matrix (Fin d) (Fin d) ), M * Sig * M.transpose = 1 (∀ (A : Matrix (Fin d) (Fin d) ) (t : ), matrixOpNorm A > matrixOpNorm Sig * tmatrixOpNorm (M * A * M.transpose) > t) ∀ {Ω : Type u_1} [inst : MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (X : ΩEuclideanSpace (Fin d)), IsSubGaussianVec μ X (matrixOpNorm Sig)∀ (a b : Fin d), MeasureTheory.Integrable (fun (ω : Ω) => (X ω).ofLp a * (X ω).ofLp b) μ

                    Existence of whitening matrix with all properties (combines spectral theorem

                    theorem Cor49.matrix_square_root_whitening {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) :
                    0 < d∀ (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ), Sig.PosDef(∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k)(∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig))ProbabilityTheory.iIndepFun X μ∃ (Y : Fin nΩEuclideanSpace (Fin d)), (∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (∀ (ω : Ω) (t : ), matrixOpNorm (empiricalCovariance X ω - Sig) > matrixOpNorm Sig * tmatrixOpNorm (empiricalCovariance Y ω - 1) > t) ProbabilityTheory.iIndepFun Y μ

                    Matrix square root whitening (reference: Spectral theorem, Rigollet §4.3). For any covariance matrix Σ with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), the whitened random vectors Y_i = Σ^{-1/2} X_i satisfy identity covariance, unit sub-Gaussian parameter, and the operator norm transfer property.

                    Proved from whitening_matrix_axiom (which combines spectral_whitening_axiom for the spectral theorem parts, and subgaussian_vec_product_integrable for the integrability part) together with subgaussian_whitening_axiom, bochner_bilinear_transform, and empirical_covariance_linear_map.

                    theorem Cor49.whitened_vectors_exist_L2 {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hd : 0 < d) (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ) (hPD : Sig.PosDef) (hcov : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig)) (hIndep : ProbabilityTheory.iIndepFun X μ) :
                    ∃ (Y : Fin nΩEuclideanSpace (Fin d)), (∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (∀ (ω : Ω) (t : ), matrixOpNorm (empiricalCovariance X ω - Sig) > matrixOpNorm Sig * tmatrixOpNorm (empiricalCovariance Y ω - 1) > t) ProbabilityTheory.iIndepFun Y μ

                    Whitened vectors existence (Y = Σ^{-1/2}X). For X with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), there exist whitened vectors Y satisfying identity covariance, unit sub-Gaussian parameter, and the operator norm implication.

                    Proved from matrix_square_root_whitening, which captures the spectral theorem application referenced in the book's proof of Theorem 4.6.

                    Helper lemmas for the union bound argument #

                    theorem Cor49.half_div16_eq_div32 (t : ) :
                    min ((t / 2 / 16) ^ 2) (t / 2 / 16) = min ((t / 32) ^ 2) (t / 32)
                    theorem Cor49.card_sq_le_144d {d : } {α : Type u_1} {N : Finset α} (hN : N.card 12 ^ d) :
                    N.card ^ 2 144 ^ d
                    theorem Cor49.eq_4_7_identity_L2 {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hn : 0 < n) (hd : 0 < d) (Y : Fin nΩEuclideanSpace (Fin d)) (hcov_id : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (hIndep : ProbabilityTheory.iIndepFun Y μ) (t : ) (ht : 0 < t) :
                    μ {ω : Ω | matrixOpNorm (empiricalCovariance Y ω - 1) > t} ENNReal.ofReal (288 ^ d * Real.exp (-(n / 2 * min ((t / 32) ^ 2) (t / 32))))

                    Equation (4.7) (Identity-case concentration for L2/spectral norm).

                    For i.i.d. Y₁,...,Yₙ with E[YYᵀ] = I and Y ~ subG_d(1), P(‖Σ̂_Y - I‖_op > t) ≤ 288^d · exp[-n/2 · min((t/32)², t/32)]

                    Proved in the book via:

                    • ε-net N for S^{d-1} with |N| ≤ 12^d (Lemma 1.18)
                    • ε-net reduction (Lemma 4.2): ‖A‖op ≤ 2·max{x,y∈N} |xᵀAy|
                    • Polarization + sub-exponential lifting (Lemma 1.12)
                    • Bernstein's inequality (Theorem 1.13) for each pair
                    • Union bound over |N|² ≤ 144^d pairs, with factor 2 from Bernstein tail
                    theorem Cor49.whitening_reduction_L2 {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hd : 0 < d) (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ) (hPD : Sig.PosDef) (hcov : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig)) (hIndep : ProbabilityTheory.iIndepFun X μ) :
                    ∃ (Y : Fin nΩEuclideanSpace (Fin d)), (∀ (i : Fin n) (j k : Fin d), (ω : Ω), (Y i ω).ofLp j * (Y i ω).ofLp k μ = if j = k then 1 else 0) (∀ (i : Fin n), IsSubGaussianVec μ (Y i) 1) (∀ (ω : Ω) (t : ), matrixOpNorm (empiricalCovariance X ω - Sig) > matrixOpNorm Sig * tmatrixOpNorm (empiricalCovariance Y ω - 1) > t) ProbabilityTheory.iIndepFun Y μ

                    Whitening reduction for L2/spectral norm.

                    For X with E[XXᵀ] = Σ and X ~ subG_d(‖Σ‖_op), the whitened vectors Y = Σ^{-1/2}X satisfy:

                    1. E[YYᵀ] = I (identity covariance)
                    2. Y ~ subG_d(1) (unit sub-Gaussian parameter)
                    3. ‖Σ̂_X − Σ‖_op > ‖Σ‖_op·t ⟹ ‖Σ̂_Y − I‖_op > t (from Σ̂_X − Σ = Σ^{1/2}(Σ̂_Y − I)Σ^{1/2} and sub-multiplicativity)

                    Proved from the axiomatized whitened_vectors_exist_L2.

                    Assembly: Whitening + eq (4.7) → general concentration #

                    theorem Cor49.general_concentration_L2 {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hn : 0 < n) (hd : 0 < d) (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ) (hPD : Sig.PosDef) (hcov : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig)) (hIndep : ProbabilityTheory.iIndepFun X μ) (t : ) (ht : 0 < t) :
                    μ {ω : Ω | matrixOpNorm (empiricalCovariance X ω - Sig) > matrixOpNorm Sig * t} ENNReal.ofReal (288 ^ d * Real.exp (-(n / 2 * min ((t / 32) ^ 2) (t / 32))))

                    General concentration via whitening + identity-case bound. Combines the whitening reduction with equation (4.7).

                    Algebraic rate bound #

                    Setting t = C · covRate d n δ in the exponential tail bound from equation (4.7), we verify that 288^d · exp(-n/2 · min((t/32)², t/32)) ≤ δ for C = 1024.

                    theorem Cor49.h288d_le_exp16d (d : ) :
                    288 ^ d Real.exp (16 * d)
                    theorem Cor49.exp_neg_log_bound (δ : ) ( : 0 < δ) (hδ1 : δ < 1) :
                    Real.exp (-(16 * Real.log (1 / δ))) δ
                    theorem Cor49.rate_algebraic_bound {d : } (hd : 0 < d) (n : ) (hn : 0 < n) (δ : ) ( : 0 < δ) (hδ1 : δ < 1) :
                    288 ^ d * Real.exp (-(n / 2 * min ((1024 * covRate d n δ / 32) ^ 2) (1024 * covRate d n δ / 32))) δ
                    theorem Cor49.opNorm_concentration {d : } {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ) (hn : 0 < n) (hd : 0 < d) (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ) (hPD : Sig.PosDef) (hcov : ∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k) (hSubG : ∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig)) (hIndep : ProbabilityTheory.iIndepFun X μ) (delta : ) (hdelta : 0 < delta) (hdelta1 : delta < 1) :

                    Theorem 4.6 (Operator norm concentration for i.i.d. sub-Gaussian random vectors).

                    With probability ≥ 1 − δ, ‖Σ̂ − Σ‖_op ≤ C₁ · ‖Σ‖_op · covRate(d,n,δ).

                    The proof follows the book (Rigollet, §4.3):

                    • Whitening reduction Y = Σ^{-1/2}X reduces to identity covariance
                    • Equation (4.7) gives the exponential tail bound via ε-net + Bernstein
                    • Algebraic simplification with C₁ = 1024 absorbs the constants

                    The sub-steps (whitening, per-pair Bernstein) are axiomatized because they require Mathlib infrastructure not currently available (matrix square roots, ε-net covering for L2, sub-exponential Bernstein). The assembly is fully proved.

                    theorem Cor49.davisKahan_eigvec_bound {d : } (Sig : Matrix (Fin d) (Fin d) ) (θ : ) (v : EuclideanSpace (Fin d)) (hSpike : IsSpikedCovariance Sig θ v) (SigTilde : Matrix (Fin d) (Fin d) ) (hPSD : SigTilde.PosSemidef) (vhat : EuclideanSpace (Fin d)) (hvhat : IsLargestEigenvector SigTilde vhat) :
                    ∃ (ε : ), (ε = 1 ε = -1) ε vhat - v 2 * 2 / θ * matrixOpNorm (SigTilde - Sig)

                    From Theorem 4.8 (Davis-Kahan): eigenvector error bound (pointwise). For the spiked covariance model Σ = θvv⊤ + I_d, if v̂ is a largest eigenvector of any PSD matrix Σ̃, then min_{ε∈{±1}} ‖εv̂ − v‖ ≤ (2√2/θ) · ‖Σ̃ − Σ‖_op.

                    Derived from Thm48.theorem_4_8 which gives the squared form: min(‖v̂-v‖², ‖v̂+v‖²) ≤ (8/θ²)‖Σ̃ − Σ‖_op².

                    theorem Cor49.empiricalCovariance_posSemidef {d n : } {Ω : Type u_1} (X : Fin nΩEuclideanSpace (Fin d)) (ω : Ω) :

                    The empirical covariance Σ̂ = (1/n)ΣᵢXᵢXᵢᵀ is always PSD. Each summand XᵢXᵢᵀ = vecMulVec xᵢ (star xᵢ) is PSD, and a nonneg-scaled sum of PSD matrices is PSD.

                    Helper lemmas for the spiked operator norm computation #

                    The inner product on EuclideanSpace equals the dot product of the underlying vectors.

                    The squared EuclideanSpace norm equals the self-dot-product.

                    theorem Cor49.unit_dotProduct_one {d : } (v : EuclideanSpace (Fin d)) (hv : v = 1) :

                    For a unit vector in EuclideanSpace, the self-dot-product is 1.

                    theorem Cor49.spiked_mulVec_raw {d : } (θ : ) (v x : Fin d) :
                    (θ Matrix.vecMulVec v v + 1).mulVec x = (θ * v ⬝ᵥ x) v + x

                    The mulVec computation for the spiked covariance model (on raw vectors).

                    theorem Cor49.spiked_opNorm_eq {d : } (Sig : Matrix (Fin d) (Fin d) ) (θ : ) (v : EuclideanSpace (Fin d)) (hSpike : IsSpikedCovariance Sig θ v) :
                    matrixOpNorm Sig = 1 + θ

                    Under the spiked covariance model Σ = θvv⊤ + I_d, the operator norm is ‖Σ‖_op = 1 + θ. Proof: Upper bound via triangle inequality + Cauchy-Schwarz; lower bound by evaluating at the spike vector v.

                    Helper: complement probability bound #

                    theorem Cor49.prob_compl_bound {Ω : Type u_1} [MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (S : Set Ω) (δ : ) ( : 0 < δ) (hbad : μ S ENNReal.ofReal δ) :
                    μ S ENNReal.ofReal (1 - δ)

                    For a probability measure, if the complement has measure ≤ ofReal δ, then the set has measure ≥ ofReal (1 - δ). This version does NOT require measurability of S, using measure_union_le instead of measure_add_measure_compl.

                    Corollary 4.9 (PCA consistency via empirical covariance) #

                    The corollary combines Theorem 4.6 and Theorem 4.8 as follows:

                    1. Theorem 4.6 bounds the operator norm of the estimation error: ‖Σ̂ − Σ‖_op ≤ C₁ · ‖Σ‖_op · rate(d,n,δ) w.p. ≥ 1 − δ.

                    2. Theorem 4.8 (Davis-Kahan) converts operator norm error to eigenvector error: min_{ε ∈ {±1}} ‖ε v̂ − v‖₂ ≤ (2√2/θ) · ‖Σ̂ − Σ‖_op.

                    3. Under the spiked model Σ = θvv⊤ + I_d, the operator norm is ‖Σ‖op = 1 + θ. Substituting: min{ε ∈ {±1}} ‖ε v̂ − v‖₂ ≤ (2√2·C₁/θ) · (1+θ) · rate(d,n,δ) = C · (1+θ)/θ · rate(d,n,δ)

                    theorem Cor49.corollary_4_9 :
                    ∃ (C : ), 0 < C ∀ (d : ), 0 < d∀ {Ω : Type u_1} [inst : MeasurableSpace Ω] (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (n : ), 0 < n∀ (X : Fin nΩEuclideanSpace (Fin d)) (Sig : Matrix (Fin d) (Fin d) ) (θ : ) (v : EuclideanSpace (Fin d)), IsSpikedCovariance Sig θ v(∀ (i : Fin n) (j k : Fin d), (ω : Ω), (X i ω).ofLp j * (X i ω).ofLp k μ = Sig j k)(∀ (i : Fin n), IsSubGaussianVec μ (X i) (matrixOpNorm Sig))ProbabilityTheory.iIndepFun X μ∀ (delta : ), 0 < deltadelta < 1μ {ω : Ω | ∀ (vhat : EuclideanSpace (Fin d)), IsLargestEigenvector (empiricalCovariance X ω) vhat∃ (ε : ), (ε = 1 ε = -1) ε vhat - v C * (1 + θ) / θ * covRate d n delta} ENNReal.ofReal (1 - delta)

                    Corollary 4.9 (PCA consistency via empirical covariance).

                    Let X₁,...,Xₙ be n i.i.d. copies of a sub-Gaussian random vector X ∈ ℝ^d with E[XX⊤] = Σ and X ~ subG_d(‖Σ‖_op). Assume Σ = θvv⊤ + I_d (spiked covariance model). Then the largest eigenvector v̂ of the empirical covariance matrix Σ̂ satisfies:

                    min_{ε∈{±1}} |εv̂ − v|₂ ≲ ((1+θ)/θ) · (√((d+log(1/δ))/n) ∨ (d+log(1/δ))/n)

                    with probability 1 − δ.

                    The i.i.d. assumption is encoded through:

                    • hcov: identical distribution (same covariance for all i)
                    • hSubG: identical sub-Gaussian parameter for all i
                    • hIndep: mutual independence of the random vectors