Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Cor_3_7

Corollary 3.7: Combined ℓ₀/ℓ₁ Oracle Inequality via Maurey #

Under the assumptions of Theorem 3.4 (BIC sparse oracle inequality) and a normalized dictionary (max_j |φ_j|₂ ≤ √n), Maurey's argument (Theorem 3.6) yields:

MSE(φ_{θ̂}) ≤ inf_θ { 2·MSE(φ_θ) + C · min(σ²|θ|₀ log(eM)/n, σ|θ|₁ √(log(eM)/n)) } + C σ² log(1/δ) / n

with probability at least 1 − δ.

The proof combines:

Local definitions #

noncomputable def Rigollet.Chapter3.MSE_37 {n : } (fhat f : Fin n) :

MSE of a prediction vector f̂ relative to the true signal f

Instances For
    noncomputable def Rigollet.Chapter3.support_size_37 {M : } (θ : Fin M) :

    ℓ₀ "norm" (support size) of a vector

    Instances For
      noncomputable def Rigollet.Chapter3.l1norm_37 {M : } (θ : Fin M) :

      ℓ₁ norm of a vector

      Instances For

        Dictionary normalization condition: max_j |φ_j|₂ ≤ √n. Equivalently, max_j Σᵢ (Φ_{i,j})² ≤ n. This is the normalization assumed in Corollary 3.7 (D = 1 case of Thm 3.6).

        Instances For

          Algebraic core #

          theorem Rigollet.Chapter3.le_add_min_of_le_both (x a t₁ t₂ c : ) (h₁ : x a + t₁ + c) (h₂ : x a + t₂ + c) :
          x a + min t₁ t₂ + c

          If x ≤ a + t₁ + c and x ≤ a + t₂ + c, then x ≤ a + min t₁ t₂ + c.

          theorem Rigollet.Chapter3.cor_3_7_combined_oracle_det {n M : } (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat : Fin M) (σ C logM_n tail : ) (h_l0 : ∀ (θ : Fin M), MSE_37 (Φ.mulVec θhat) f 2 * MSE_37 (Φ.mulVec θ) f + C * σ ^ 2 * (support_size_37 θ) * logM_n + tail) (h_l1 : ∀ (θ : Fin M), MSE_37 (Φ.mulVec θhat) f 2 * MSE_37 (Φ.mulVec θ) f + C * σ * l1norm_37 θ * logM_n + tail) (θ : Fin M) :
          MSE_37 (Φ.mulVec θhat) f 2 * MSE_37 (Φ.mulVec θ) f + min (C * σ ^ 2 * (support_size_37 θ) * logM_n) (C * σ * l1norm_37 θ * logM_n) + tail

          Corollary 3.7 (algebraic core): Combined ℓ₀/ℓ₁ oracle inequality.

          Given:

          • An ℓ₀ oracle inequality (Thm 3.4, α = 1/3): MSE_hat ≤ 2 · MSE(φ_θ) + C σ² |θ|₀ log(eM)/n + tail
          • An ℓ₁ oracle inequality (Thm 3.4 + Thm 3.6 / Maurey): MSE_hat ≤ 2 · MSE(φ_θ) + C σ |θ|₁ √(log(eM)/n) + tail

          Conclude the combined bound with min of both penalties.

          Bridge lemmas between Chapter3 and Rigollet.Chapter3 definitions #

          theorem Rigollet.Chapter3.MSE_37_eq_Chapter3_MSE {n : } (fhat f : Fin n) :
          MSE_37 fhat f = Chapter3.MSE fhat f

          Maurey's approximation (Theorem 3.6) #

          Theorem 3.6 states: for any θ' ∈ ℝ^M and any k ≥ 1, there exists θ with |θ|₀ ≤ 2k such that MSE(φ_θ) ≤ MSE(φ_{θ'}) + |θ'|₁²/k, provided the dictionary is normalized (max_j |φ_j|₂ ≤ √n, i.e., D = 1).

          Derived from Chapter3.maurey_sparse_approx (the probabilistic core, axiomatized because the proof uses random rounding not available in Mathlib). Reference: Rigollet, Theorem 3.6 (Maurey's empirical method).

          theorem Rigollet.Chapter3.maurey_approximation_37 {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (hNorm : DictNormalized Φ) (θ' : Fin M) (k : ) (hk : 1 k) :
          ∃ (θ : Fin M), support_size_37 θ 2 * k MSE_37 (Φ.mulVec θ) f MSE_37 (Φ.mulVec θ') f + l1norm_37 θ' ^ 2 / k

          Support size is bounded by M #

          Algebraic optimization lemma for Maurey's argument #

          For positive reals a, b and k in [√(a/b)/2, √(a/b)], we have a/k + b·k ≤ 3√(ab). This is the key bound for optimizing the Maurey+ℓ₀ tradeoff.

          theorem Rigollet.Chapter3.opt_bound_sqrt (a b k : ) (ha : 0 < a) (hb : 0 < b) (hk_pos : 0 < k) (hk_le : k (a / b)) (hk_ge : (a / b) / 2 k) :
          a / k + b * k 3 * (a * b)

          ⌊x⌋ ≥ x/2 when x ≥ 1

          Helper lemmas for the three-case bound #

          theorem Rigollet.Chapter3.six_sqrt_le_of_ge_36 (C : ) (hC : 0 < C) (hC36 : 36 C) :
          6 * C C

          6√C ≤ C when C ≥ 36

          theorem Rigollet.Chapter3.sqrt_ratio_eq (R σ C L : ) (hR : 0 < R) ( : 0 < σ) (hCL : 0 < C * L) :
          (R ^ 2 / (C * σ ^ 2 * L)) = R / (σ * (C * L))

          √(R²/(Cσ²L)) = R/(σ√(CL)) for positive R, σ, CL

          theorem Rigollet.Chapter3.maurey_opt_algebra (R σ C L k : ) (hR : 0 < R) ( : 0 < σ) (hC : 0 < C) (hL : 0 < L) (hk : 0 < k) (hk_le : k R / (σ * (C * L))) (hk_ge : R / (σ * (C * L)) / 2 k) (hC36 : 36 C) :
          2 * R ^ 2 / k + 2 * (C * σ ^ 2 * L) * k C * σ * R * L

          The key algebraic bound: 2R²/k + 2Cσ²kL ≤ CσR√L when k ∈ [kbar/2, kbar] and C ≥ 36. Here kbar = R/(σ√(CL)).

          Maurey + three-case optimization (key step of Cor 3.7 proof) #

          The proof of Corollary 3.7 derives the ℓ₁ bound from the ℓ₀ bound via:

          1. Maurey's approximation (Thm 3.6): for any θ' and k, ∃θ with |θ|₀ ≤ 2k and MSE(θ) ≤ MSE(θ') + |θ'|₁²/k.
          2. Substituting this θ into the ℓ₀ oracle inequality and optimizing over k.

          The bound 2R²/k + 2Cσ²kL ≤ 3√(4CR²σ²L) = 6√C·σR√L ≤ CσR√L (when C ≥ 36) requires k̄ = R/(σ√(CL)) ≥ 1 so that the integer floor ⌊k̄⌋ ≥ 1.

          Hypotheses: hC_large ensures the constant absorption works (6√C ≤ C), and hR_lower ensures k̄ ≥ 1 (the "Case 1" regime of the three-case analysis). The complementary case (k̄ < 1, "Case 2") is handled separately in the caller.

          Reference: Rigollet, Corollary 3.7 proof, "three cases for k̄".

          theorem Rigollet.Chapter3.maurey_three_case_bound {n M : } (_hn : 0 < n) (_hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (_hNorm : DictNormalized Φ) (σ : ) ( : 0 < σ) (C_const : ) (hC : 0 < C_const) (fhat' : Fin n) (θ' : Fin M) (tail : ) (hMaurey : ∀ (k : ), 1 k∃ (θ : Fin M), support_size_37 θ 2 * k MSE_37 (Φ.mulVec θ) f MSE_37 (Φ.mulVec θ') f + l1norm_37 θ' ^ 2 / k) (h_l0 : ∀ (θ : Fin M), MSE_37 fhat' f 2 * MSE_37 (Φ.mulVec θ) f + C_const * σ ^ 2 * (support_size_37 θ) * Real.log (Real.exp 1 * M) / n + tail) (hC_large : 36 C_const) (hR_lower : σ * (C_const * (Real.log (Real.exp 1 * M) / n)) l1norm_37 θ') :
          MSE_37 fhat' f 2 * MSE_37 (Φ.mulVec θ') f + C_const * σ * l1norm_37 θ' * (Real.log (Real.exp 1 * M) / n) + tail

          Bridge lemmas between MSE_34/MSE_37 and support_size_34/support_size_37 #

          theorem Rigollet.Chapter3.MSE_34_eq_MSE_37 {n : } (fhat f : Fin n) :
          MSE_34 fhat f = MSE_37 fhat f

          Cauchy-Schwarz helper for finite sums #

          theorem Rigollet.Chapter3.abs_sum_mul_le_sqrt_mul_sqrt {n : } (f g : Fin n) :
          |i : Fin n, f i * g i| (∑ i : Fin n, f i ^ 2) * (∑ i : Fin n, g i ^ 2)

          Cauchy-Schwarz for finite sums: |∑ fᵢgᵢ| ≤ √(∑fᵢ²) · √(∑gᵢ²)

          Corollary 3.7: Probabilistic combined ℓ₀/ℓ₁ oracle inequality #

          This is the main result. It wraps the deterministic algebraic core with the probabilistic content from Theorem 3.4, specialized to α = 1/3.

          theorem Rigollet.Chapter3.cor_3_7_probabilistic {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε : ΩFin n) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseMGF μ ε (σ ^ 2)) (hBIC : ∀ (ω : Ω) (θ : Fin M), (f + ε ω - Φ.mulVec (θhat ω)) ⬝ᵥ (f + ε ω - Φ.mulVec (θhat ω)) + 16 * σ ^ 2 / (1 / 3) * Real.log (6 * Real.exp 1 * M) * (support_size_34 (θhat ω)) (f + ε ω - Φ.mulVec θ) ⬝ᵥ (f + ε ω - Φ.mulVec θ) + 16 * σ ^ 2 / (1 / 3) * Real.log (6 * Real.exp 1 * M) * (support_size_34 θ)) (hNorm : DictNormalized Φ) :
          ∃ (C_const : ), 0 < C_const μ {ω : Ω | ∀ (θ : Fin M), MSE_37 (Φ.mulVec (θhat ω)) f 2 * MSE_37 (Φ.mulVec θ) f + min (C_const * σ ^ 2 * (support_size_37 θ) * (Real.log (Real.exp 1 * M) / n)) (C_const * σ * l1norm_37 θ * (Real.log (Real.exp 1 * M) / n)) + C_const * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

          Corollary 3.7 (Rigollet). With probability at least 1 − δ, the BIC estimator θ̂^BIC satisfies:

          MSE(φ_{θ̂^BIC}) ≤ inf_θ { 2 · MSE(φ_θ) + C · [σ² |θ|₀ log(eM)/n ∧ σ |θ|₁ √(log(eM)/n)] } + C · σ² log(1/δ)/n

          The proof specializes Theorem 3.4 to α = 1/3 (giving the factor 2 oracle inequality), then uses Maurey's argument (Theorem 3.6) to derive the ℓ₁ bound, and combines both via the min.

          Hypotheses:

          • (Ω, μ) is a probability space
          • ε is sub-Gaussian noise with variance proxy σ²
          • θ̂ is the BIC estimator (minimizes |Y − Φθ|² + nτ²|θ|₀)
          • The dictionary is normalized: max_j |φ_j|₂ ≤ √n
          • 0 < δ ≤ 1 is the confidence parameter