Documentation

Atlas.HighDimensionalStatistics.code.Chapter1.Problem_1_1

Problem 1.1: Concentration of weighted sums of sub-exponential random variables #

Informal statement (Rigollet, Problem 1.1): Let X₁, …, Xₙ be independent, mean-zero, sub-exponential(λ) random variables. For any weight vector a = (a₁, …, aₙ) ∈ ℝⁿ, define the weighted sum S(a) = ∑ᵢ aᵢ Xᵢ.

Show that for any t > 0, P(|S(a)| > t) ≤ 2 exp(−C · min(t² / (λ² ‖a‖₂²), t / (λ ‖a‖_∞))) for some positive constant C.

Key idea: This is the sub-exponential analogue of Hoeffding's inequality. The two-regime tail (Gaussian for small t, exponential for large t) arises from the sub-exponential MGF bound and Chernoff's method, applied to the weighted sum using independence. The bound interpolates between a Gaussian regime (t² term dominates when t is small relative to λ‖a‖_∞) and an exponential regime (t term dominates when t is large).

Lean formalization: We state this for X : Fin n → Ω → ℝ where each X i is sub-exponential with parameter lambda (Definition 1.11). The weighted sum is ∑ i, a i * X i ω, the ℓ₂ norm squared is ∑ i, a i ^ 2, and the ℓ_∞ norm is ‖a‖ (the sup-norm on Fin n → ℝ).

theorem problem_1_1_weighted_subexponential {n : } {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} (hP : MeasureTheory.IsProbabilityMeasure μ) {X : Fin nΩ} {lambda : } (hX_meas : ∀ (i : Fin n), Measurable (X i)) (hX_subexp : ∀ (i : Fin n), IsSubExponential (X i) lambda) (hX_indep : ProbabilityTheory.iIndepFun X μ) :
∃ (C : ), 0 < C ∀ (a : Fin n) (t : ), 0 < t(μ {ω : Ω | |i : Fin n, a i * X i ω| > t}).toReal 2 * Real.exp (-C * min (t ^ 2 / (lambda ^ 2 * i : Fin n, a i ^ 2)) (t / (lambda * a)))

Problem 1.1 (Concentration of weighted sub-exponential sums).

Given independent mean-zero sub-exponential(λ) random variables X₁, …, Xₙ, the weighted sum S(a) = ∑ᵢ aᵢXᵢ satisfies a two-regime tail bound: P(|S(a)| > t) ≤ 2 exp(−C · min(t²/(λ²‖a‖₂²), t/(λ‖a‖_∞))).

The min captures the sub-exponential behavior: Gaussian tail for moderate deviations (t ≲ λ²‖a‖₂²/‖a‖_∞) and exponential tail for large deviations.

In the Lean formalization:

  • ∑ i, a i ^ 2 is ‖a‖₂²
  • ‖a‖ is the sup-norm ‖a‖_∞ on Fin n → ℝ
  • IsSubExponential encodes the sub-exponential MGF condition from Def 1.11