Documentation

Atlas.HighDimensionalStatistics.code.Chapter2.Thm_2_14

theorem thm_2_14_bic_bound {n d : } (hn : 0 < n) (X : Matrix (Fin n) (Fin d) ) (Y : Fin n) (θstar : Fin d) (eps : Fin n) (hY : Y = X.mulVec θstar + eps) (τ : ) (_hτ : 0 < τ) (θhat : Fin d) (hBIC : Rigollet.IsBICEstimatorL2 X Y τ θhat) (t : ) (ht : 0 t) (hnoise : ∀ (θ : Fin d), 4 * (eps ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τ ^ 2 * (Rigollet.l0norm θ) + t)) :
X.mulVec (θhat - θstar) ⬝ᵥ X.mulVec (θhat - θstar) 2 * n * τ ^ 2 * (Rigollet.l0norm θstar) + t

Theorem 2.14 (BIC estimator MSE bound). Under the linear model Y = Xθ* + ε with ε sub-Gaussian(σ²), the BIC estimator with τ² ≍ σ²log(ed)/n satisfies: |X(θ̂ᴮᴵᶜ - θ*)|² ≤ 2nτ²|θ*|₀ + t with probability 1 - δ (where t depends on σ²log(1/δ)).

Formalized here as the deterministic algebraic bound that follows from BIC optimality (using sqL2norm) plus the "sup-out" noise bound.

The noise hypothesis encodes the probabilistic bound from the proof: after applying Young's inequality 2ab ≤ 2a² + b²/2 and normalizing, one obtains 4(εᵀu)² ≤ 2nτ²|θ̂|₀ + t where u = Xδ/|Xδ|₂. In quadratic form (avoiding division by |Xδ|₂): 4⟨ε,Xδ⟩² ≤ |Xδ|₂² · (2nτ²|θ̂|₀ + t).

Probabilistic form of Theorem 2.14 #

theorem coordinateSubG_to_dotProductMGF {n : } {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (v : Fin n) :
v ⬝ᵥ v 1∀ (s : ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω ⬝ᵥ v)) μ (ω : Ω), Real.exp (s * ε ω ⬝ᵥ v) μ Real.exp (s ^ 2 * σ ^ 2 / 2)

Coordinate-wise sub-Gaussian implies dot-product MGF bound. This encapsulates the fact that if each εᵢ is sub-Gaussian(σ²), then for any unit vector v with ‖v‖₂² ≤ 1, the linear combination ⟨ε, v⟩ satisfies the MGF bound E[exp(s·⟨ε,v⟩)] ≤ exp(s²σ²/2).

In the book's model, the noise coordinates are independent, so this follows from Proposition 1.4 (Theorem 1.6 in Chapter 1). The independence is part of the statistical model definition, not a theorem to be proved. This helper bridges the formalization gap between coordinate-wise IsSubGaussian and the joint MGF bound.

Note: This theorem requires independence and measurability of the noise coordinates, which are part of the statistical model definition. The proof uses Theorem 1.6 (sub-Gaussian linear combinations).

theorem single_support_thm119_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (τsq : ) (hτsq : 0 < τsq) (t : ) (ht : 0 < t) (S : Finset (Fin d)) (k : ) (hSk : S.card = k) (hk1 : 1 k) :
μ {ω : Ω | ∃ (θ : Fin d), (∀ jS, θ j = 0) 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 > X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τsq * (Rigollet.l0norm θ) + t)} ENNReal.ofReal (6 ^ (k + Rigollet.l0norm θstar) * Real.exp (-t / (32 * σ ^ 2) - n * τsq * k / (16 * σ ^ 2)))

Axiom (Theorem 1.19 application): Single-support-set concentration bound.

For a fixed support set S of size k, the book (lines 1604-1614) constructs an ONB for span{X_j : j ∈ S ∪ supp(θ*)} (dimension ≤ k + |θ*|₀), applies Theorem 1.19 to the sub-Gaussian maximum of ε projected onto each ONB vector, and bridges the quadratic form to inner product form.

This captures the "by reference to Theorem 1.19" step for a single support set, including:

  • ONB construction for column subspaces (book line 1604),
  • Bridging from Matrix.mulVec quadratic form to EuclideanSpace inner products,
  • Application of theorem_1_19_tail_bound (book lines 1610-1614),
  • The dimension bound dim(span) ≤ k + |θ*|₀ (book line 1604).

Proved via single_support_thm119_bound' in Thm_2_14_SubspaceBound.lean, which composes subspace_thm119_bound (Theorem 1.19 for subspace projections) with onb_event_containment (ONB construction axiom for column spans).

theorem per_size_thm119_raw_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (τsq : ) (hτsq : 0 < τsq) (t : ) (ht : 0 < t) (k : ) (hk : k Finset.Icc 1 d) :
μ {ω : Ω | ∃ (θ : Fin d), Rigollet.l0norm θ k 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 > X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τsq * (Rigollet.l0norm θ) + t)} ENNReal.ofReal ((d.choose k) * 6 ^ (k + Rigollet.l0norm θstar) * Real.exp (-t / (32 * σ ^ 2) - n * τsq * k / (16 * σ ^ 2)))

Raw per-support-size bound from Theorem 1.19 + union over supports.

For a fixed support size k, the book (lines 1600-1615) applies Theorem 1.19 to each support set S of size k, using the ONB of span{X_j : j ∈ S ∪ supp(θ*)}. The union bound over the (d choose k) supports of size k gives:

μ(bad event for l0norm ≤ k) ≤ (d choose k) · 6^(k + |θ*|₀) · exp(-t/(32σ²) - n·τsq·k/(16σ²))

The proof factors the argument into:

  1. single_support_thm119_bound (proved via SubspaceBound, by reference to Theorem 1.19): for each support set S of size k, the per-support concentration bound.
  2. Set containment: the bad event {∃θ, l0norm θ ≤ k ∧ bad(θ)} is contained in the union ⋃_{S ∈ powersetCard(k)} {∃θ, supp(θ) ⊆ S ∧ bad(θ)}.
  3. Union bound over the (d choose k) support sets.

The remaining Lemma 2.7 absorption step is proved explicitly in per_size_concentration_bound.

theorem per_size_concentration_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (τsq : ) (hτsq : τsq (16 * Real.log 6 + 32 * Real.log (Real.exp 1 * d)) * σ ^ 2 / n) (t : ) (ht : 0 < t) (k : ) (hk : k Finset.Icc 1 d) :
μ {ω : Ω | ∃ (θ : Fin d), Rigollet.l0norm θ k 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 > X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τsq * (Rigollet.l0norm θ) + t)} ENNReal.ofReal (Real.exp (-t / (32 * σ ^ 2) - k * Real.log (Real.exp 1 * d) + (Rigollet.l0norm θstar) * Real.log 12))

Per-support-size concentration bound (book lines 1596-1617).

This combines the raw Theorem 1.19 bound (per_size_thm119_raw_bound) with Lemma 2.7 (binom bound) and the τ² condition to obtain:

μ(bad event for l0norm ≤ k) ≤ exp(-t/(32σ²) - k·log(ed) + |θ*|₀·log(12))

The proof explicitly performs the Lemma 2.7 absorption step (book line 1617):

  1. Lemma 2.7: (d choose k) ≤ (ed/k)^k ≤ (ed)^k,
  2. Combine: (d choose k) · 6^(k+|θ*|₀) ≤ (6ed)^k · 6^|θ*|₀,
  3. Use τ² condition: n·τ²·k/(16σ²) ≥ k·(log 6 + 2·log(ed)),
  4. Absorb: (6ed)^k · exp(-nτ²k/(16σ²)) ≤ exp(-k·log(ed)),
  5. Relax: 6^|θ*|₀ ≤ 12^|θ*|₀.
theorem zero_bad_event_contained {Ω : Type u_1} {n d : } (_hn : 0 < n) (hd : 0 < d) (_X : Matrix (Fin n) (Fin d) ) (_θstar : Fin d) ( : ΩFin n) (_τsq : ) (_hτsq : 0 _τsq) (_t : ) (_ht : 0 < _t) {ω : Ω} (hbad : 4 * ( ω ⬝ᵥ _X.mulVec (0 - _θstar)) ^ 2 > _X.mulVec (0 - _θstar) ⬝ᵥ _X.mulVec (0 - _θstar) * (2 * n * _τsq * (Rigollet.l0norm 0) + _t)) :
kFinset.Icc 1 d, ∃ (θ : Fin d), Rigollet.l0norm θ k 4 * ( ω ⬝ᵥ _X.mulVec (θ - _θstar)) ^ 2 > _X.mulVec (θ - _θstar) ⬝ᵥ _X.mulVec (θ - _θstar) * (2 * n * _τsq * (Rigollet.l0norm θ) + _t)

Containment of θ=0 bad event in the k≥1 union (corrected statement).

The θ=0 bad event is contained in the union ⋃_{k=1}^d S(k) where S(k) uses the condition l0norm θ ≤ k (allowing θ=0 as a witness for any k ≥ 1, since l0norm(0)=0 ≤ k). This matches the book’s decomposition over support sets S with |S|=k, where supp(θ) ⊆ S includes θ=0 for any nonempty S.

theorem sup_out_intermediate_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (τsq : ) (hτsq : τsq (16 * Real.log 6 + 32 * Real.log (Real.exp 1 * d)) * σ ^ 2 / n) (t : ) (ht : 0 < t) :
μ {ω : Ω | ∃ (θ : Fin d), 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 > X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τsq * (Rigollet.l0norm θ) + t)} kFinset.Icc 1 d, ENNReal.ofReal (Real.exp (-t / (32 * σ ^ 2) - k * Real.log (Real.exp 1 * d) + (Rigollet.l0norm θstar) * Real.log 12))

Union bound + Theorem 1.19 intermediate step (book lines 1596-1617).

This theorem encodes the combined result of:

  1. The support decomposition (book lines 1596-1604),
  2. ONB construction for column subspaces (book line 1604),
  3. The ε-net argument from Theorem 1.19 (book lines 1610-1614),
  4. The union bound over supports and support sizes (book line 1608),
  5. Lemma 2.7 bounding (d choose k) · exp(-2k·log(ed)) ≤ exp(-k·log(ed)) (book line 1617).

The intermediate result (book line 1617, after Lemma 2.7): μ(bad event) ≤ ∑_{k=1}^d exp(-t/(32σ²) - k·log(ed) + |θ*|₀·log(12))

The proof uses per_size_concentration_bound (axiom, for the by-reference Theorem 1.19 and Lemma 2.7 steps) and zero_bad_event_contained (proved).

theorem sup_out_probability_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (τsq : ) (hτsq : τsq (16 * Real.log 6 + 32 * Real.log (Real.exp 1 * d)) * σ ^ 2 / n) (t : ) (ht : 0 < t) :
μ {ω : Ω | ∃ (θ : Fin d), 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 > X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * τsq * (Rigollet.l0norm θ) + t)} ENNReal.ofReal (Real.exp (-t / (32 * σ ^ 2) + (Rigollet.l0norm θstar) * Real.log 12))

Sup-out probability bound (Rigollet, line 1618).

This helper encodes the probability bound derived in lines 1596-1618 of the proof of Theorem 2.14. The argument proceeds by:

  1. Decomposing the sup over θ as max over support sizes k and supports S (line 1596-1604),
  2. Applying the ε-net argument from Theorem 1.19 to bound the sub-Gaussian maximum over each subspace (line 1610-1614), and
  3. Using Lemma 2.7 to bound the binomial coefficients and summing the geometric series (line 1616-1618).

The result is (book line 1618): P(∃θ, 4[εᵀU(θ-θ*)]² - 2nτ²|θ|₀ ≥ t) ≤ exp(-t/(32σ²) + |θ*|₀·log(12))

The proof uses sup_out_intermediate_bound (which uses per_size_concentration_bound for the by-reference steps) and then bounds the geometric series explicitly.

theorem noise_concentration_bound_thm214 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (δ : ) (hδ₀ : 0 < δ) (hδ₁ : δ < 1) :
μ {ω : Ω | ∀ (θ : Fin d), 4 * (ε ω ⬝ᵥ X.mulVec (θ - θstar)) ^ 2 X.mulVec (θ - θstar) ⬝ᵥ X.mulVec (θ - θstar) * (2 * n * ((16 * Real.log 6 + 32 * Real.log (Real.exp 1 * d)) * σ ^ 2 / n) * (Rigollet.l0norm θ) + (32 * σ ^ 2 * (Rigollet.l0norm θstar) * Real.log 12 + 32 * σ ^ 2 * Real.log (1 / δ)))} ENNReal.ofReal (1 - δ)

Noise concentration bound (from the sub-Gaussian union/ε-net argument).

This encodes the probabilistic core of Theorem 2.14's proof: with probability ≥ 1 - δ, the "sup-out" noise bound 4⟨ε,X(θ-θ*)⟩² ≤ |X(θ-θ*)|² · (2nτ²|θ|₀ + t) holds simultaneously for all θ, where τ² = (16·log(6) + 32·log(ed))·σ²/n and t = 32σ²·|θ*|₀·log(12) + 32σ²·log(1/δ).

Proved using sup_out_probability_bound (which encapsulates the Theorem 1.19 application and union bound from the book's proof, lines 1596-1618).

theorem thm_2_14_probabilistic {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n d : } (hn : 0 < n) (hd : 0 < d) (X : Matrix (Fin n) (Fin d) ) (θstar : Fin d) (ε : ΩFin n) (σ : ) ( : 0 < σ) ( : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hε_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hε_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (δ : ) (hδ₀ : 0 < δ) (hδ₁ : δ < 1) (θhat : ΩFin d) (hBIC : ∀ (ω : Ω), Rigollet.IsBICEstimatorL2 X (X.mulVec θstar + ε ω) (((16 * Real.log 6 + 32 * Real.log (Real.exp 1 * d)) * σ ^ 2 / n)) (θhat ω)) :
μ {ω : Ω | X.mulVec (θhat ω - θstar) ⬝ᵥ X.mulVec (θhat ω - θstar) 224 * (Rigollet.l0norm θstar) * σ ^ 2 * Real.log (Real.exp 1 * d) + 32 * σ ^ 2 * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

Theorem 2.14 (probabilistic form). Assume the linear model Y = Xθ* + ε where ε ~ subG_n(σ²). The BIC estimator θ̂^BIC with regularization parameter τ² = 16·log(6)·σ²/n + 32·σ²·log(ed)/n satisfies, with probability at least 1 - δ (book line 1622): |X(θ̂ - θ*)|₂² ≤ 224·|θ*|₀·σ²·log(ed) + 32·σ²·log(1/δ)

The two-term structure is faithful to the book's proof: the first term comes from 2nτ²|θ*|₀ + 32σ²|θ*|₀log(12), and the second from 32σ²log(1/δ). These have different dependence on |θ*|₀.