Auxiliary: Sub-Gaussian Chernoff bound #
Sub-Gaussian Chernoff bound. If X ~ subG(σ²), then for t > 0: P(X > t) ≤ exp(-t²/(2σ²)).
Proof: Markov on exp(sX) with s = t/σ², then optimize.
Uses mul_meas_ge_le_lintegral₀ (Markov) and
ofReal_integral_eq_lintegral_ofReal (integral conversion).
SORRY REASON: The measure-theoretic Chernoff argument requires
converting between ∫⁻ (ENNReal) and ∫ (Bochner), applying Markov's
inequality to exp(s·X), and optimizing over s. Each step is standard
but requires ~80 lines of Lean plumbing (AEMeasurable of ofReal ∘ exp,
set containment for strict/non-strict inequalities, ENNReal division).
All mathematical ingredients exist in Mathlib:
• mul_meas_ge_le_lintegral₀ (Markov)
• ofReal_integral_eq_lintegral_ofReal (integral conversion)
• IsSubGaussian.mgf_bound (MGF bound)
but the full assembly is not yet available as a single result.
Lemma 4.2 (ε-net reduction for operator norm). For any d×T matrix A, the operator norm can be bounded via 1/4-nets:
‖A‖_op ≤ 2 · max_{(x,y) ∈ N₁×N₂} xᵀAy
where N₁ is a 1/4-net for the unit sphere in ℝ^d and N₂ is a 1/4-net for the unit sphere in ℝ^T.
This extends the single ε-net argument of Theorem 1.19 to two ε-nets (one for each dimension of the matrix). The proof applies the ε-net approximation twice:
- First in the left dimension: max_{‖u‖=1} f(u) ≤ 2·max_{x∈N₁} f(x) for Lipschitz f, using ε = 1/4.
- Then in the right dimension for each fixed x. Combining the two 1/(1-2·(1/4)) = 2 factors gives the factor of 2.
Lemma 4.2: Operator norm tail bound for sub-Gaussian matrices #
Lemma 4.2 (tail bound form). For a sub-Gaussian d×T random matrix
A ~ subG_{d×T}(σ²), the operator norm satisfies the tail bound:
P(‖A‖_op > t) ≤ 12^{d+T} · exp(-t²/(8σ²))
for all t > 0.
Proof: Uses 1/4-nets N₁ for S^{d-1} and N₂ for S^{T-1} with |N₁| ≤ 12^d and |N₂| ≤ 12^T. The ε-net argument gives ‖A‖op ≤ 2 max{x∈N₁,y∈N₂} xᵀAy, then a union bound over the product N₁ × N₂ combined with the sub-Gaussian tail for each xᵀAy yields the result.
Lemma 4.2 (high-probability form). For a sub-Gaussian d×T random matrix
A ~ subG_{d×T}(σ²), with probability at least 1 - δ:
‖A‖_op ≤ 4σ√(log(12)·(d ⊔ T)) + 2σ√(2·log(1/δ))
where d ⊔ T = max(d, T).
This follows from the tail bound by setting
12^{d+T} · exp(-t²/(8σ²)) ≤ δ and solving for t.
Since d + T ≤ 2·(d ⊔ T), we have
(d+T)·log(12) ≤ 2·log(12)·(d ⊔ T), and the stated bound
uses 4σ (since √(2·2·log(12)) ≤ 4) to absorb the constants.