Corollary 3.7: Combined ℓ₀/ℓ₁ Oracle Inequality via Maurey #
Under the assumptions of Theorem 3.4 (BIC sparse oracle inequality) and a normalized dictionary (max_j |φ_j|₂ ≤ √n), Maurey's argument (Theorem 3.6) yields:
MSE(φ_{θ̂}) ≤ inf_θ { 2·MSE(φ_θ) + C · min(σ²|θ|₀ log(eM)/n, σ|θ|₁ √(log(eM)/n)) } + C σ² log(1/δ) / n
with probability at least 1 − δ.
The proof combines:
- The ℓ₀ oracle inequality from Thm 3.4 (with α = 1/3).
- Maurey's approximation (Thm 3.6) converting ℓ₁ to ℓ₀ bounds. Key step: optimizing min_k(R²/k + Cσ²k log(eM)/n) ≈ σR√(log(eM)/n).
Local definitions #
Algebraic core #
Corollary 3.7 (algebraic core): Combined ℓ₀/ℓ₁ oracle inequality.
Given:
- An ℓ₀ oracle inequality (Thm 3.4, α = 1/3): MSE_hat ≤ 2 · MSE(φ_θ) + C σ² |θ|₀ log(eM)/n + tail
- An ℓ₁ oracle inequality (Thm 3.4 + Thm 3.6 / Maurey): MSE_hat ≤ 2 · MSE(φ_θ) + C σ |θ|₁ √(log(eM)/n) + tail
Conclude the combined bound with min of both penalties.
Bridge lemmas between Chapter3 and Rigollet.Chapter3 definitions #
Maurey's approximation (Theorem 3.6) #
Theorem 3.6 states: for any θ' ∈ ℝ^M and any k ≥ 1, there exists θ with |θ|₀ ≤ 2k such that MSE(φ_θ) ≤ MSE(φ_{θ'}) + |θ'|₁²/k, provided the dictionary is normalized (max_j |φ_j|₂ ≤ √n, i.e., D = 1).
Derived from Chapter3.maurey_sparse_approx (the probabilistic core,
axiomatized because the proof uses random rounding not available in Mathlib).
Reference: Rigollet, Theorem 3.6 (Maurey's empirical method).
Support size is bounded by M #
Algebraic optimization lemma for Maurey's argument #
For positive reals a, b and k in [√(a/b)/2, √(a/b)], we have a/k + b·k ≤ 3√(ab). This is the key bound for optimizing the Maurey+ℓ₀ tradeoff.
Helper lemmas for the three-case bound #
The key algebraic bound: 2R²/k + 2Cσ²kL ≤ CσR√L when k ∈ [kbar/2, kbar] and C ≥ 36. Here kbar = R/(σ√(CL)).
Maurey + three-case optimization (key step of Cor 3.7 proof) #
The proof of Corollary 3.7 derives the ℓ₁ bound from the ℓ₀ bound via:
- Maurey's approximation (Thm 3.6): for any θ' and k, ∃θ with |θ|₀ ≤ 2k and MSE(θ) ≤ MSE(θ') + |θ'|₁²/k.
- Substituting this θ into the ℓ₀ oracle inequality and optimizing over k.
The bound 2R²/k + 2Cσ²kL ≤ 3√(4CR²σ²L) = 6√C·σR√L ≤ CσR√L (when C ≥ 36) requires k̄ = R/(σ√(CL)) ≥ 1 so that the integer floor ⌊k̄⌋ ≥ 1.
Hypotheses: hC_large ensures the constant absorption works (6√C ≤ C),
and hR_lower ensures k̄ ≥ 1 (the "Case 1" regime of the three-case analysis).
The complementary case (k̄ < 1, "Case 2") is handled separately in the caller.
Reference: Rigollet, Corollary 3.7 proof, "three cases for k̄".
Bridge lemmas between MSE_34/MSE_37 and support_size_34/support_size_37 #
Cauchy-Schwarz helper for finite sums #
Corollary 3.7: Probabilistic combined ℓ₀/ℓ₁ oracle inequality #
This is the main result. It wraps the deterministic algebraic core with the probabilistic content from Theorem 3.4, specialized to α = 1/3.
Corollary 3.7 (Rigollet). With probability at least 1 − δ, the BIC estimator θ̂^BIC satisfies:
MSE(φ_{θ̂^BIC}) ≤ inf_θ { 2 · MSE(φ_θ) + C · [σ² |θ|₀ log(eM)/n ∧ σ |θ|₁ √(log(eM)/n)] } + C · σ² log(1/δ)/n
The proof specializes Theorem 3.4 to α = 1/3 (giving the factor 2 oracle inequality), then uses Maurey's argument (Theorem 3.6) to derive the ℓ₁ bound, and combines both via the min.
Hypotheses:
- (Ω, μ) is a probability space
- ε is sub-Gaussian noise with variance proxy σ²
- θ̂ is the BIC estimator (minimizes |Y − Φθ|² + nτ²|θ|₀)
- The dictionary is normalized: max_j |φ_j|₂ ≤ √n
- 0 < δ ≤ 1 is the confidence parameter