Documentation

Atlas.HighDimensionalStatistics.code.Chapter1.Problem_1_6

Problem 1.6: Maximum of sub-Gaussian process on compact subset of unit sphere #

Informal statement (Rigollet, Problem 1.6): Let K be a compact subset of the unit sphere of ℝ^p that admits an ε-net N_ε w.r.t. Euclidean distance satisfying |N_ε| ≤ (C/ε)^d for all ε ∈ (0,1), where C ≥ 1 and d ≤ p are positive constants. Let X ~ subG_p(σ²) be a centered sub-Gaussian random vector.

Show that there exist positive constants c₁, c₂ such that for any δ ∈ (0,1): max_{θ ∈ K} θ⊤X ≤ c₁ σ √(d log(2p/d)) + c₂ σ √(log(1/δ)) with probability at least 1 − δ.

Key idea: Use the chaining/ε-net argument from Theorem 1.19 adapted to K. Discretize K using an ε-net, apply the union bound over the net points (each ⟨θ, X⟩ is sub-Gaussian by the multivariate sub-Gaussian assumption), then control the approximation error using the Lipschitz property of θ ↦ ⟨θ, X⟩ and iteratively refine. The term d log(2p/d) comes from the covering number bound log|N_ε| ≤ d log(C/ε), and the log(1/δ) term is the confidence correction.

The book asks to "comment on the result in light of Theorem 1.19" — the bound shows that the effective dimension of K is d (not p), so if K has low intrinsic dimension, the concentration is much tighter than the naive p-dimensional bound.

Formalization approach #

We work in EuclideanSpace ℝ (Fin p) for ℝ^p. The inner product θ⊤X is @inner ℝ _ _ θ (X ω). The multivariate sub-Gaussian condition is IsSubGaussianVec, meaning every unit-vector projection is sub-Gaussian (Definition 1.2). The covering number bound is expressed via IsEpsilonNet from Definition 1.17.

References #

A random vector X : Ω → ℝ^p is sub-Gaussian with variance proxy σsq (denoted X ~ subG_p(σ²)) if for every unit vector θ, the projection ⟪θ, X⟫ is sub-Gaussian with variance proxy σsq. This is the multivariate extension of Definition 1.2 from Rigollet's High-Dimensional Statistics.

Instances For
    theorem problem_1_6_compact_sphere_max {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} (x✝¹ : MeasureTheory.IsProbabilityMeasure μ) {p : } (hp : 0 < p) (K : Set (EuclideanSpace (Fin p))) (hK_compact : IsCompact K) (hK_sphere : K Metric.sphere 0 1) (C : ) (hC : 1 C) (d : ) (hd_pos : 0 < d) (hd_le_p : d p) (hcov : ∀ (ε : ), 0 < εε < 1∃ ( : Finset (EuclideanSpace (Fin p))), IsEpsilonNet K (↑) ε .card (C / ε) ^ d) (σ : ) ( : 0 < σ) (X : ΩEuclideanSpace (Fin p)) (hX : IsSubGaussianVec X (σ ^ 2) μ) :
    ∃ (c₁ : ), 0 < c₁ ∃ (c₂ : ), 0 < c₂ ∀ (δ : ), 0 < δδ < 1μ {ω : Ω | θK, inner θ (X ω) c₁ * σ * (d * Real.log (2 * p / d)) + c₂ * σ * (Real.log (1 / δ))} ENNReal.ofReal (1 - δ)

    Problem 1.6. (Maximum of sub-Gaussian process on compact subset of unit sphere)

    Let K be a compact subset of the unit sphere of ℝ^p that admits an ε-net N_ε with |N_ε| ≤ (C/ε)^d for all ε ∈ (0,1), where C ≥ 1 and d ≤ p are positive constants. Let X ~ subG_p(σ²) be a centered random vector.

    Then there exist positive constants c₁, c₂ such that for any δ ∈ (0,1):

    max_{θ ∈ K} θ⊤X ≤ c₁ σ √(d log(2p/d)) + c₂ σ √(log(1/δ))

    with probability at least 1 - δ.