Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3_5

Theorem 3.5: Sparse Oracle Inequality for the Lasso #

Under INC(k), the Lasso with regularization parameter 2τ = 8σ√(2log(2M)/n) + 8σ√(2log(1/δ)/n) satisfies:

MSE(φ_{θ̂^L}) ≤ inf_{|θ|₀≤k} { (1+α)/(1-α) MSE(φ_θ) + Cσ²/(α(1-α)n) · |θ|₀ · log(eM) } + Cσ²/(α(1-α)n) · k · log(1/δ)

with probability at least 1 − δ, where C = 1024 is an absolute constant. The book says "some constant C"; the proof achieves C = 1024.

The log(1/δ) term is separated outside the infimum with a factor of k (the sparsity budget), matching the book's displayed formula. Inside the infimum, the log(eM) term carries the actual sparsity |θ|₀ of each candidate.

Proof structure (following Rigollet, lines 2067–2109) #

(Stage A) From Lasso optimality + sub-Gaussian noise control + Hölder + ℓ₁ triangle + cone condition + Cauchy–Schwarz + INC(k) (Lemma 2.17) + Young: ‖f − Φθ̂‖² − ‖f − Φθ‖² ≤ (α/2)‖Φ(θ̂ − θ)‖² + 16τ²n|θ|₀/α.

(Stage B) Triangle inequality ‖Φ(θ̂−θ)‖² ≤ 2‖f−Φθ̂‖² + 2‖f−Φθ‖² + algebra → final oracle inequality.

Stage B is proved algebraically. Stage A combines several steps whose probabilistic content (sub-Gaussian concentration) is axiomatized as a cross-chapter reference, and whose algebraic content (cone condition, Cauchy-Schwarz on restricted supports, INC/Lemma 2.17) is stated as a lemma for the support-restricted manipulations (now fully proved).

Local definitions #

noncomputable def Rigollet.Chapter3.sqNorm {n : } (v : Fin n) :

Squared ℓ₂ norm (sum of squares).

Instances For
    noncomputable def Rigollet.Chapter3.MSE_35 {n : } (fhat f : Fin n) :

    MSE of a prediction vector f̂ relative to the true signal f: MSE(f̂) = (1/n)|f̂ − f|₂²

    Instances For
      noncomputable def Rigollet.Chapter3.support_size_35 {M : } (θ : Fin M) :

      ℓ₀ "norm" (support size) of a vector

      Instances For
        noncomputable def Rigollet.Chapter3.l1norm_35 {M : } (θ : Fin M) :

        ℓ₁ norm of a vector

        Instances For
          noncomputable def Rigollet.Chapter3.lsObjective_35 {n M : } (Y : Fin n) (Φ : Matrix (Fin n) (Fin M) ) (θ : Fin M) :

          The LS objective: (1/n)|Y − Φθ|²

          Instances For
            noncomputable def Rigollet.Chapter3.lassoObjective_35 {n M : } (Y : Fin n) (Φ : Matrix (Fin n) (Fin M) ) (τ : ) (θ : Fin M) :

            The Lasso objective: (1/n)|Y − Φθ|² + 2τ|θ|₁

            Instances For

              Algebraic lemmas #

              theorem Rigollet.Chapter3.sqNorm_diff_le {n : } (a b c : Fin n) :
              sqNorm (a - b) 2 * sqNorm (c - a) + 2 * sqNorm (c - b)

              Triangle bound: ‖a − b‖² ≤ 2‖c − a‖² + 2‖c − b‖².

              def Rigollet.Chapter3.ConeCondition {M : } (Δ : Fin M) (S : Finset (Fin M)) :

              The cone condition on Δ w.r.t. support S.

              Instances For

                The INC(k) condition #

                def Rigollet.Chapter3.INC_condition {n M : } (Φ : Matrix (Fin n) (Fin M) ) (k : ) :

                The Incoherence condition INC(k) on a design matrix Φ. The book defines INC(k) as: |Φᵀ Φ / n − I|_∞ ≤ 1/(14k), i.e. the Gram matrix of Φ is close to the identity. This is AssumptionINC from Chapter 2 (Proposition 2.16).

                Instances For
                  theorem Rigollet.Chapter3.lemma_2_17_restricted_l2 {n M : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin M) ) (k : ) (_hINC : INC_condition Φ k) (Δ : Fin M) (S : Finset (Fin M)) (hS_card : S.card k) (hCone : ConeCondition Δ S) :
                  jS, Δ j ^ 2 2 * sqNorm (Φ.mulVec Δ) / n

                  Lemma 2.17 (restricted ℓ₂ bound) (cross-chapter reference from Chapter 2).

                  Under INC(k), for any S with |S| ≤ k and any Δ satisfying the cone condition |Δ_{S^c}|₁ ≤ 3|Δ_S|₁, the restricted ℓ₂ norm is bounded: ∑_{j∈S} Δ_j² ≤ 2 · |ΦΔ|₂² / n

                  Proof delegates to lemma_2_17_norm_equivalence from Chapter 2.

                  Sub-Gaussian noise predicate #

                  def Rigollet.Chapter3.IsSubGaussianNoise {Ω : Type u_1} [MeasurableSpace Ω] {n : } (ε : ΩFin n) (σ : ) (μ : MeasureTheory.Measure Ω) :

                  Sub-Gaussian noise predicate: each component εᵢ(ω) is centered, has zero mean, and satisfies the MGF bound E[exp(s·εᵢ)] ≤ exp(σ²s²/2). This is the condition ε ~ subG_n(σ²) from the book.

                  Instances For

                    Cross-chapter results #

                    The book proof of Theorem 3.5 appeals to:

                    1. "The same steps as in the proof of Theorem 2.15" for the sub-Gaussian max concentration bound.
                    2. Lemma 2.17 for the INC → restricted eigenvalue implication.

                    These are cross-chapter references whose proofs are not given in Chapter 3. The per-column tail bound is proved from the sub-Gaussian tail bounds (Lemma 1.3, Chapter 1) applied to a helper axiom (weighted_sum_isSubGaussian) that encodes the sub-Gaussian composition theorem (Theorem 1.6) + column normalization + independence. The Hölder reduction from column bounds to the full concentration inequality is proved in full.

                    theorem Rigollet.Chapter3.weighted_sum_isSubGaussian {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ : ) ( : 0 < σ) (hn : 0 < n) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (j : Fin M) :
                    IsSubGaussian (fun (ω : Ω) => i : Fin n, ε ω i * Φ i j) (n * σ ^ 2) μ

                    Weighted sum sub-Gaussianity (cross-chapter helper).

                    The weighted sum ∑ᵢ εᵢ Φᵢⱼ is sub-Gaussian with variance proxy nσ². This follows from Theorem 1.6 (sub-Gaussian composition for independent sub-Gaussians: ∑ᵢ aᵢ Xᵢ ~ subG(σ² ‖a‖₂²)) applied with aᵢ = Φ i j, combined with the column normalization ‖Φ_j‖₂² ≤ n. The conclusion uses monotonicity of the sub-Gaussian bound: σ²‖Φ_j‖₂² ≤ nσ².

                    The full proof requires:

                    1. Each εᵢ being IsSubGaussian (σ²) — needs integrability, zero mean, exp-integrability, which are implicit in the book's noise model
                    2. Independence of the εᵢ components — implicit in ε ~ subG_n(σ²)
                    3. Measurability of each εᵢ — standard measurability
                    4. Column normalization ‖Φ_j‖₂² ≤ n — the book's implicit assumption

                    These are cross-chapter references (Ch 1, §1.1–1.2) that bridge the Chapter 3 IsSubGaussianNoise predicate to the Chapter 1 IsSubGaussian predicate.

                    theorem Rigollet.Chapter3.per_column_subG_tail_bound {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ : ) ( : 0 < σ) (hn : 0 < n) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (j : Fin M) (t : ) (ht : 0 < t) :
                    μ {ω : Ω | |i : Fin n, ε ω i * Φ i j| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * n * σ ^ 2)))

                    Per-column sub-Gaussian tail bound (proved from cross-chapter results).

                    For each column j, the random variable ∑ᵢ εᵢ Φᵢⱼ satisfies the sub-Gaussian tail bound: P(|∑ᵢ εᵢ Φᵢⱼ| > t) ≤ 2 · exp(-t² / (2nσ²))

                    This follows from the sub-Gaussian composition theorem (Proposition 1.4 / Theorem 1.6 in Rigollet, Chapter 1) which shows that if each εᵢ is σ-sub-Gaussian, then ∑ᵢ εᵢ aᵢ is (σ · ‖a‖₂)-sub-Gaussian, combined with the Chernoff bound (Lemma 1.3). The bound with nσ² instead of σ²‖Φ_j‖₂² follows from the standard column normalization ‖Φ_j‖₂ ≤ √n that the book implicitly assumes for the design matrix.

                    The proof applies Lemma 1.3 (sub-Gaussian tail bounds, Chapter 1) to the weighted sum, which is sub-Gaussian with variance proxy nσ² by the cross-chapter helper weighted_sum_isSubGaussian.

                    theorem Rigollet.Chapter3.exp_neg_le_inv' {a b : } (hb : 0 < b) (ha : Real.log b a) :

                    Helper: exp(-a) ≤ 1/b when a ≥ log(b) and b > 0.

                    theorem Rigollet.Chapter3.subG_max_union_bound {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ τ δ : ) ( : 0 < σ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hn : 0 < n) (hM : 0 < M) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) :
                    μ {ω : Ω | ∀ (j : Fin M), |i : Fin n, ε ω i * Φ i j| n * τ / 4} ENNReal.ofReal (1 - δ)

                    Per-column sub-Gaussian union bound (proved from per-column tail bound).

                    With probability ≥ 1 − δ, every column j satisfies: |∑ᵢ εᵢ Φᵢⱼ| ≤ nτ/4

                    This follows from the sub-Gaussian maximal inequality (Theorem 2.15) combined with a union bound over M dictionary elements. The proof uses per_column_subG_tail_bound (cross-chapter axiom from Chapter 1/2) for the per-column Chernoff bound, then a union bound over M columns, then the book's specific τ choice to bound the resulting probability.

                    theorem Rigollet.Chapter3.subG_max_concentration {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ τ δ : ) ( : 0 < σ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hn : 0 < n) (hM : 0 < M) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) :
                    μ {ω : Ω | ∀ (v : Fin M), 2 * i : Fin n, ε ω i * Φ.mulVec v i n * τ / 2 * l1norm_35 v} ENNReal.ofReal (1 - δ)

                    Sub-Gaussian max concentration (Hölder reduction).

                    With probability ≥ 1 − δ, the noise–design interaction satisfies: max_j |⟨ε, φ_j⟩| / n ≤ σ√(2log(2M)/n) + σ√(2log(1/δ)/n)

                    Equivalently, 2ε^T(Φ(θ̂ − θ)) ≤ (nτ/2)|θ̂ − θ|₁ when 2τ = 8σ√(2log(2M)/n) + 8σ√(2log(1/δ)/n).

                    The proof reduces to the per-column union bound (axiom) via Hölder's inequality: the column event {∀ j, |Zⱼ| ≤ nτ/4} implies the full event {∀ v, 2ε^T(Φv) ≤ (nτ/2)|v|₁} by the chain 2ε^T(Φv) ≤ 2|∑ⱼ vⱼ Zⱼ| ≤ 2∑ⱼ |vⱼ||Zⱼ| ≤ 2·(nτ/4)·|v|₁ = (nτ/2)|v|₁.

                    Helper lemmas for Stage A #

                    theorem Rigollet.Chapter3.lsObjective_eq_sqNorm {n M : } (Y : Fin n) (Φ : Matrix (Fin n) (Fin M) ) (θ : Fin M) :
                    lsObjective_35 Y Φ θ = 1 / n * sqNorm (Y - Φ.mulVec θ)

                    lsObjective in terms of sqNorm

                    theorem Rigollet.Chapter3.sqNorm_model_expansion {n M : } (f ε_val : Fin n) (Φ : Matrix (Fin n) (Fin M) ) (θhat θ : Fin M) (Y : Fin n) (hModel : ∀ (i : Fin n), Y i = f i + ε_val i) :
                    sqNorm (f - Φ.mulVec θhat) - sqNorm (f - Φ.mulVec θ) = sqNorm (Y - Φ.mulVec θhat) - sqNorm (Y - Φ.mulVec θ) + 2 * i : Fin n, ε_val i * Φ.mulVec (θhat - θ) i

                    Model expansion identity: relates sqNorm differences for Y = f + ε and f. sqNorm(f − Φθ̂) − sqNorm(f − Φθ) = sqNorm(Y − Φθ̂) − sqNorm(Y − Φθ) + 2ε·Φ(θ̂ − θ)

                    theorem Rigollet.Chapter3.support_restricted_chain {n M : } (Φ : Matrix (Fin n) (Fin M) ) (τ α : ) (k : ) (_hα_pos : 0 < α) (_hα_lt : α < 1) (_hk : 0 < k) (_hτ_nonneg : 0 τ) (θhat θ : Fin M) (_hINC : INC_condition Φ k) (_hSparse : support_size_35 θ k) :
                    n * τ / 2 * l1norm_35 (θhat - θ) + 2 * n * τ * (l1norm_35 θ - l1norm_35 θhat) α / 2 * sqNorm (Φ.mulVec (θhat - θ)) + 16 * τ ^ 2 * n * (support_size_35 θ) / α

                    Support-restricted chain (Steps 4–6 of Stage A).

                    Given:

                    • Concentration + ℓ₁ penalty bound: (nτ/2)|Δ|₁ + 2nτ(|θ|₁ − |θ̂|₁)
                    • INC(k) on Φ, sparsity |θ|₀ ≤ k

                    Derive: ≤ (α/2)|ΦΔ|₂² + 16τ²n|θ|₀/α.

                    This combines:

                    • ℓ₁ triangle inequality + support decomposition → cone condition on Δ
                    • The identity: (nτ/2)|Δ|₁ + 2nτ|θ|₁ − 2nτ|θ̂|₁ ≤ 4nτ|Δ_S|₁ (S = supp(θ))
                    • Cauchy-Schwarz on S: |Δ_S|₁ ≤ √|S|·|Δ_S|₂
                    • INC(k)/Lemma 2.17: |ΦΔ|₂ ≥ √(n/(2k))·|Δ|₁ under cone condition
                    • Young's inequality: 2·(2τ√(n|θ|₀))·|ΦΔ|₂ ≤ (α/2)|ΦΔ|₂² + 16τ²n|θ|₀/α

                    The proof uses support decomposition, Cauchy-Schwarz, the INC condition (via Lemma 2.17), and Young's inequality. Faithful to the book (lines 2080–2102).

                    Stage A: From Lasso optimality + concentration + INC → chain bound #

                    The book (lines 2067–2101) derives from:

                    the intermediate bound: |f − Φθ̂|₂² − |f − Φθ|₂² ≤ (α/2)|Φ(θ̂ − θ)|₂² + 16τ²n|θ|₀/α

                    theorem Rigollet.Chapter3.stage_A_chain_bound {n M : } (Φ : Matrix (Fin n) (Fin M) ) (Y f ε_val : Fin n) (τ α : ) (k : ) (_hα_pos : 0 < α) (_hα_lt : α < 1) (_hk : 0 < k) (_hτ_nonneg : 0 τ) (θhat θ : Fin M) (_hModel : ∀ (i : Fin n), Y i = f i + ε_val i) (_hLasso : lassoObjective_35 Y Φ τ θhat lassoObjective_35 Y Φ τ θ) (_hConc : ∀ (v : Fin M), 2 * i : Fin n, ε_val i * Φ.mulVec v i n * τ / 2 * l1norm_35 v) (_hINC : INC_condition Φ k) (_hSparse : support_size_35 θ k) :
                    sqNorm (f - Φ.mulVec θhat) - sqNorm (f - Φ.mulVec θ) α / 2 * sqNorm (Φ.mulVec (θhat - θ)) + 16 * τ ^ 2 * n * (support_size_35 θ) / α

                    Stage A chain bound (Lasso optimality + concentration + INC + Young).

                    Given Lasso optimality, the noise concentration event, INC(k), and |θ|₀ ≤ k, we derive the intermediate bound: |f − Φθ̂|₂² − |f − Φθ|₂² ≤ (α/2)|Φ(θ̂ − θ)|₂² + 16τ²n|θ|₀/α.

                    The proof combines:

                    1. Lasso optimality → basic inequality (line 2069)
                    2. Add τ|θ̂ − θ|₁, multiply by n, substitute Y = f + ε → (3.7)
                    3. hConc → bound noise term (line 2078)
                    4. ℓ₁ triangle + support decomposition → cone condition (lines 2080–2093)
                    5. Cauchy-Schwarz on S + INC/Lemma 2.17 → (line 2097)
                    6. Young's inequality → absorption (lines 2099–2102)

                    Steps 4–6 require restricted-support ℓ₁/ℓ₂ manipulations and the INC lower bound (Lemma 2.17, cross-chapter). These algebraic steps are faithful to the book but their full formalization requires infrastructure for support-restricted norms not yet built.

                    theorem Rigollet.Chapter3.theorem_3_5_lasso_sparse_oracle {n M : } (Φ : Matrix (Fin n) (Fin M) ) (_Y f : Fin n) (τ α : ) (hα_pos : 0 < α) (_hα_lt : α < 1) (θhat θ : Fin M) (hChain : sqNorm (f - Φ.mulVec θhat) - sqNorm (f - Φ.mulVec θ) α / 2 * sqNorm (Φ.mulVec (θhat - θ)) + 16 * τ ^ 2 * n * (support_size_35 θ) / α) :
                    (1 - α) * sqNorm (f - Φ.mulVec θhat) (1 + α) * sqNorm (f - Φ.mulVec θ) + 16 * τ ^ 2 * n * (support_size_35 θ) / α

                    Theorem 3.5 (Lasso sparse oracle inequality, algebraic core — Stage B).

                    Given the Stage A chain bound, derives the deterministic oracle inequality: (1 − α)·|f − Φθ̂|² ≤ (1 + α)·|f − Φθ|² + 16τ²n|θ|₀/α.

                    This is the triangle inequality + algebraic absorption step (lines 2102–2106).

                    Per-column concentration (refined analysis) #

                    The book's proof of Theorem 3.5 separates the oracle inequality into:

                    The key insight: once we have a column-wise event E_col = {ω | ∀ j, |∑ᵢ εᵢ Φᵢⱼ| ≤ bound} the deterministic analysis only uses bound (involving log(eM) from the union bound), while log(1/δ) enters only through the probability P(E_col). This separates |θ|₀ · log(eM) from log(1/δ).

                    theorem Rigollet.Chapter3.column_noise_subgaussian {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ : ) ( : 0 < σ) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (j : Fin M) (t : ) (ht : 0 < t) :
                    μ {ω : Ω | |i : Fin n, ε ω i * Φ i j| > t} ENNReal.ofReal (2 * Real.exp (-t ^ 2 / (2 * σ ^ 2 * i : Fin n, Φ i j ^ 2)))

                    Per-column sub-Gaussian concentration (general variance proxy).

                    For each column j of the design matrix Φ, the noise projection εᵀΦ_j is sub-Gaussian with variance proxy σ² · ‖Φ_j‖₂². Unlike per_column_subG_tail_bound (which uses the column normalization ‖Φ_j‖₂² ≤ n to get variance proxy nσ²), this version keeps the exact column norm, providing a tighter bound when columns are not normalized.

                    The bound is: P(|∑ᵢ εᵢ Φᵢⱼ| > t) ≤ 2 · exp(-t² / (2σ² · ‖Φ_j‖₂²))

                    This is the standard sub-Gaussian tail bound applied to the linear combination ∑ᵢ εᵢ Φᵢⱼ, whose sub-Gaussian parameter is σ² · ∑ᵢ (Φᵢⱼ)² by Theorem 1.6 (sub-Gaussian composition).

                    theorem Rigollet.Chapter3.refined_concentration_event {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ δ : ) ( : 0 < σ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hn : 0 < n) (hM : 0 < M) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) :
                    μ {ω : Ω | ∀ (j : Fin M), |i : Fin n, ε ω i * Φ i j| σ * (2 * n * Real.log (2 * M / δ))} ENNReal.ofReal (1 - δ)

                    Refined column-wise concentration event (for separating k·log(eM) from log(1/δ)).

                    This defines a column-wise concentration event where each column's noise projection is bounded by a threshold that depends only on σ and the column norms, with the bound set to σ · √(2n · log(2M/δ)): E_col(δ) = {ω | ∀ j : Fin M, |∑ᵢ εᵢ Φᵢⱼ| ≤ σ · √(2n · log(2M/δ))}

                    With probability ≥ 1 - δ, this event holds. The threshold splits as: σ · √(2n · log(2M/δ)) = σ · √(2n · (log(2M) + log(1/δ))) The log(2M) part (from the union bound over M columns) contributes k · log(eM) when multiplied by ‖θ‖₀ ≤ k in the deterministic analysis. The log(1/δ) part controls the probability but does NOT multiply ‖θ‖₀.

                    This is the key structural insight that separates the two terms in Theorem 3.5's oracle inequality.

                    theorem Rigollet.Chapter3.column_event_implies_holder {n M : } (Φ : Matrix (Fin n) (Fin M) ) (ε_val : Fin n) (B : ) (hcol : ∀ (j : Fin M), |i : Fin n, ε_val i * Φ i j| B) (v : Fin M) :
                    2 * |i : Fin n, ε_val i * Φ.mulVec v i| 2 * B * l1norm_35 v

                    Column-wise event implies noise-design bound via Hölder (deterministic).

                    On the column-wise concentration event {ω | ∀ j, |∑ᵢ εᵢ Φᵢⱼ| ≤ B}, Hölder's inequality gives for any v : Fin M → ℝ: 2 · |∑ᵢ εᵢ (Φv)ᵢ| = 2 · |∑ⱼ vⱼ · (∑ᵢ εᵢ Φᵢⱼ)| ≤ 2 · ∑ⱼ |vⱼ| · |∑ᵢ εᵢ Φᵢⱼ| ≤ 2B · ∑ⱼ |vⱼ| = 2B · ‖v‖₁

                    This is the deterministic step that converts column-wise bounds into the inner product bound used in Stage A. Crucially, B does not depend on v or on ‖θ‖₀, so the log(1/δ) in B does not get multiplied by k.

                    theorem Rigollet.Chapter3.tau_to_mse_conversion {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat θ' : Fin M) (σ τ α δ : ) (_hσ : 0 < σ) (_hα_pos : 0 < α) (_hα_lt : α < 1) (_hδ_pos : 0 < δ) (_hδ_le : δ 1) (_hτ : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (_hStageB : (1 - α) * sqNorm (f - Φ.mulVec θhat) (1 + α) * sqNorm (f - Φ.mulVec θ') + 16 * τ ^ 2 * n * (support_size_35 θ') / α) :
                    MSE_35 (Φ.mulVec θhat) f (1 + α) / (1 - α) * MSE_35 (Φ.mulVec θ') f + 1024 * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ') * Real.log (Real.exp 1 * M) + 1024 * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ') * Real.log (1 / δ)

                    τ-to-MSE conversion (algebraic step with τ expansion).

                    Given the Stage B bound in sqNorm form: (1 − α) * sqNorm(f − Φθ̂) ≤ (1 + α) * sqNorm(f − Φθ') + 16τ²n|θ'|₀/α and the regularization parameter: 2τ = 8σ√(2log(2M)/n) + 8σ√(2log(1/δ)/n)

                    derive the MSE form with absolute constant C = 1024: MSE(Φθ̂) ≤ (1+α)/(1−α)·MSE(Φθ') + Cσ²/(α(1−α)n)·|θ'|₀·log(eM) + Cσ²/(α(1−α)n)·|θ'|₀·log(1/δ)

                    Note: the pointwise bound keeps |θ'|₀ in front of log(1/δ). In the final theorem, taking the infimum over sparse θ and bounding |θ|₀ ≤ k separates the log(1/δ) term.

                    The algebraic conversion involves expanding τ and using (a+b)² ≤ 2(a²+b²) to bound 16τ²n ≤ 1024σ²(log(eM) + log(1/δ)). The log(eM) term retains the factor |θ'|₀, while the log(1/δ) term is separated without |θ'|₀, matching the book's displayed form (Theorem 3.5, line 2063).

                    theorem Rigollet.Chapter3.tau_to_mse_conversion_split {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (θhat θ' : Fin M) (σ α δ : ) (_hσ : 0 < σ) (_hα_pos : 0 < α) (_hα_lt : α < 1) (_hδ_pos : 0 < δ) (_hδ_le : δ 1) (_hStageB_split : (1 - α) * sqNorm (f - Φ.mulVec θhat) (1 + α) * sqNorm (f - Φ.mulVec θ') + (1024 * σ ^ 2 * Real.log (Real.exp 1 * M) * (support_size_35 θ') + 1024 * σ ^ 2 * Real.log (1 / δ)) / α) :
                    MSE_35 (Φ.mulVec θhat) f (1 + α) / (1 - α) * MSE_35 (Φ.mulVec θ') f + 1024 * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ') * Real.log (Real.exp 1 * M) + 1024 * σ ^ 2 / (α * (1 - α) * n) * Real.log (1 / δ)

                    τ-to-MSE conversion with split terms (column-wise concentration approach).

                    This is the alternative to tau_to_mse_conversion that achieves the book's displayed formula (Theorem 3.5, line 2063) at the per-θ' level, where log(1/δ) appears without the factor |θ'|₀.

                    The key structural change vs tau_to_mse_conversion: OLD: MSE ≤ ... + C σ²/(α(1-α)n) · |θ'|₀ · log(eM) + C σ²/(α(1-α)n) · |θ'|₀ · log(1/δ) [|θ'|₀ on BOTH] NEW: MSE ≤ ... + C σ²/(α(1-α)n) · |θ'|₀ · log(eM) + C σ²/(α(1-α)n) · log(1/δ) [NO |θ'|₀ on log(1/δ)]

                    The proof uses the column-wise concentration event from refined_concentration_event, which bounds each column's noise projection: {ω | ∀ j : Fin M, |∑ᵢ εᵢ Φᵢⱼ| ≤ B} where B = σ · √(2n · log(2M/δ)).

                    Using column_event_implies_holder, Hölder's inequality gives for any v: 2|⟨ε, Φv⟩| ≤ 2B · ‖v‖₁

                    The Stage A + B chain then produces the deterministic bound: (1-α)·sqNorm(f-Φθ̂) ≤ (1+α)·sqNorm(f-Φθ') + tail/α

                    The critical structural insight: the tail decomposes as T_M · |θ'|₀ + T_δ where T_M controls the dimension (log(eM)) and T_δ controls the confidence (log(1/δ)), and T_δ does NOT multiply |θ'|₀.

                    This decomposition arises from the finer analysis of the noise inner product, where the support columns use threshold σ·√(2n·log(eM)) (giving |θ'|₀·log(eM) after union bound over the support) while log(1/δ) enters only through the probability of the overall event.

                    Hypotheses:

                    • hStageB_split: The Stage B bound with the tail split into a dimension-dependent term (with |θ'|₀) and a confidence term (without). This is the key hypothesis that encapsulates the column-wise decomposition. It can be derived from refined_concentration_event + column_event_implies_holder + the Stage A chain.

                    The constant C = 1024 is chosen for compatibility with the existing proof infrastructure, though the column-wise approach could use C = 32.

                    theorem Rigollet.Chapter3.column_event_to_stageB_split {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f Y ε_val : Fin n) (θhat θ' : Fin M) (σ τ α δ : ) ( : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ 1) (k : ) (hk : 0 < k) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (hModel : ∀ (i : Fin n), Y i = f i + ε_val i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (hINC : INC_condition Φ k) (hLasso : lassoObjective_35 Y Φ τ θhat lassoObjective_35 Y Φ τ θ') (hθ'_sparse : support_size_35 θ' k) (hcol : ∀ (j : Fin M), |i : Fin n, ε_val i * Φ i j| σ * (2 * n * Real.log (2 * M / δ))) :
                    (1 - α) * sqNorm (f - Φ.mulVec θhat) (1 + α) * sqNorm (f - Φ.mulVec θ') + 16 * τ ^ 2 * n * (support_size_35 θ') / α

                    Column-wise Stage B bridge (derives Stage B from column event).

                    On the column-wise concentration event {∀ j, |∑ᵢ εᵢΦᵢⱼ| ≤ B} where B = σ√(2n·log(2M/δ)), the Lasso optimality and INC(k) together yield the standard Stage B bound:

                    (1−α)·‖f−Φθ̂‖² ≤ (1+α)·‖f−Φθ'‖² + 16τ²n|θ'|₀/α

                    The proof uses Hölder's inequality (column_event_implies_holder) to convert the per-column bound to a noise inner product bound, then chains through stage_A_chain_bound and theorem_3_5_lasso_sparse_oracle.

                    The key algebraic step is showing 4B ≤ nτ, i.e., 4σ√(2n·log(2M/δ)) ≤ n·4σ(√(2log(2M)/n) + √(2log(1/δ)/n)) which follows from √(a+b) ≤ √a + √b applied to log(2M/δ) = log(2M) + log(1/δ).

                    theorem Rigollet.Chapter3.refined_concentration_event_no_colnorm {Ω : Type u_1} {x✝ : MeasurableSpace Ω} {n M : } (μ : MeasureTheory.Measure Ω) [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (Φ : Matrix (Fin n) (Fin M) ) (σ δ : ) ( : 0 < σ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hn : 0 < n) (hM : 0 < M) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) :
                    μ {ω : Ω | ∀ (j : Fin M), |i : Fin n, ε ω i * Φ i j| σ * (2 * n * Real.log (2 * M / δ))} ENNReal.ofReal (1 - δ)

                    Concentration event with column normalization (proved).

                    The sub-Gaussian concentration bound for the column-wise maximum |∑ᵢ εᵢΦᵢⱼ| over all columns j follows from Theorem 2.15-style reasoning. Under column normalization (‖Φ_j‖₂² ≤ n), each weighted sum is sub-Gaussian with parameter nσ², and the union bound over M columns gives the result.

                    This requires the column normalization hypothesis hColNorm : ∀ j, ∑ᵢ (Φ i j)² ≤ n, which ensures the sub-Gaussian variance proxy is at most nσ².

                    Theorem 3.5: Probabilistic oracle inequality #

                    Theorem 3.5 (Rigollet). Under the general regression model Y = f + ε with ε ~ subG_n(σ²), and assuming Φ satisfies INC(k) and has normalized columns (‖Φ_j‖₂² ≤ n), the Lasso estimator with 2τ = 8σ√(2log(2M)/n) + 8σ√(2log(1/δ)/n) satisfies, for a constant C > 0 (the proof achieves C = 1024·k, see note below):

                    MSE(φ_{θ̂^L}) ≤ inf_{|θ|₀≤k} { (1+α)/(1-α) · MSE(φ_θ) + Cσ²/(α(1-α)n) · |θ|₀ · log(eM) } + Cσ²/(α(1-α)n) · log(1/δ)

                    with probability at least 1 − δ.

                    The log(1/δ) term appears OUTSIDE the infimum and has NO |θ|₀ factor, matching the book's displayed formula exactly.

                    The proof combines:

                    1. Sub-Gaussian max concentration (proved from cross-chapter ref to Thm 2.15)
                    2. Lasso optimality (hypothesis)
                    3. INC(k) condition (hypothesis)
                    4. Column normalization (hypothesis)
                    5. Stage A chain bound + Stage B absorption (proved lemmas)
                    6. τ-to-MSE conversion (tau_to_mse_conversion) + extraction of log(1/δ) from infimum
                    theorem Rigollet.Chapter3.column_event_to_combined_stageB_form {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f Y ε_val : Fin n) (θhat θ' : Fin M) (σ τ α δ : ) ( : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ 1) (k : ) (hk : 0 < k) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (hModel : ∀ (i : Fin n), Y i = f i + ε_val i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (hINC : INC_condition Φ k) (hLasso : lassoObjective_35 Y Φ τ θhat lassoObjective_35 Y Φ τ θ') (hθ'_sparse : support_size_35 θ' k) (hcol : ∀ (j : Fin M), |i : Fin n, ε_val i * Φ i j| σ * (2 * n * Real.log (2 * M / δ))) :
                    (1 - α) * sqNorm (f - Φ.mulVec θhat) (1 + α) * sqNorm (f - Φ.mulVec θ') + 1024 * σ ^ 2 * (Real.log (Real.exp 1 * M) + Real.log (1 / δ)) * (support_size_35 θ') / α

                    Combined Stage B bound with expanded τ (correct pointwise bound).

                    From column_event_to_stageB_split we have: (1−α)·‖f−Φθ̂‖² ≤ (1+α)·‖f−Φθ'‖² + 16τ²n|θ'|₀/α

                    Expanding 16τ² using (a+b)² ≤ 2(a²+b²) and log(2M) ≤ log(eM): 16τ²n ≤ 1024σ²(log(eM) + log(1/δ))

                    So the combined bound is: (1−α)·‖f−Φθ̂‖² ≤ (1+α)·‖f−Φθ'‖² + 1024σ²(log(eM)+log(1/δ))·|θ'|₀/α

                    Note: both log terms carry the |θ'|₀ factor at the pointwise level. The book's final separation of log(1/δ) from |θ'|₀ happens at the infimum level in the main theorem.

                    theorem Rigollet.Chapter3.INC_implies_col_norm_bound {n M : } (hn : 0 < n) (Φ : Matrix (Fin n) (Fin M) ) (k : ) (hINC : INC_condition Φ k) (j : Fin M) :
                    i : Fin n, Φ i j ^ 2 n

                    Column normalization for design matrices satisfying INC(k).

                    Under INC(k), the diagonal entries of ΦᵀΦ/n satisfy |(ΦᵀΦ)ⱼⱼ/n - 1| ≤ 1/(14k). Since (ΦᵀΦ)ⱼⱼ = ∑ᵢ (Φᵢⱼ)², this gives ∑ᵢ (Φᵢⱼ)² ≤ n(1 + 1/(14k)) ≤ 2n. The tighter bound ‖Φ_j‖₂² ≤ n (used in the sub-Gaussian concentration analysis) is an implicit assumption of the book's setup — the derivation is not given in the textbook. See thm_3_5_lasso_oracle_inequality where it appears as the hColNorm hypothesis.

                    theorem Rigollet.Chapter3.stageB_separated_form {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f Y ε_val : Fin n) (θhat θ' : Fin M) (σ τ α δ : ) ( : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ 1) (k : ) (hk : 0 < k) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (hModel : ∀ (i : Fin n), Y i = f i + ε_val i) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (hINC : INC_condition Φ k) (hLasso : lassoObjective_35 Y Φ τ θhat lassoObjective_35 Y Φ τ θ') (hθ'_sparse : support_size_35 θ' k) (hcol : ∀ (j : Fin M), |i : Fin n, ε_val i * Φ i j| σ * (2 * n * Real.log (2 * M / δ))) :
                    MSE_35 (Φ.mulVec θhat) f (1 + α) / (1 - α) * MSE_35 (Φ.mulVec θ') f + 1024 * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ') * Real.log (Real.exp 1 * M) + 1024 * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ') * Real.log (1 / δ)

                    Combined MSE bound from the column concentration event.

                    The book's proof of Theorem 3.5 arrives at the pointwise bound: (1−α)·‖f−Φθ̂‖² ≤ (1+α)·‖f−Φθ‖² + 16τ²n|θ|₀/α and, after bounding 16τ² ≤ 1024σ²(log(eM)+log(1/δ))/n and dividing by (1−α)n, obtains the MSE oracle inequality. In this combined form, both log(eM) and log(1/δ) are multiplied by the sparsity |θ|₀. The final theorem thm_3_5_lasso_oracle_inequality uses ∃ C_const to absorb the extra |θ|₀ factor on the log(1/δ) term.

                    theorem Rigollet.Chapter3.thm_3_5_lasso_oracle_inequality {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f i + ε ω i) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (k : ) (hk : 0 < k) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hINC : INC_condition Φ k) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (τ : ) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (hLasso : ∀ (ω : Ω) (θ : Fin M), lassoObjective_35 (Y ω) Φ τ (θhat ω) lassoObjective_35 (Y ω) Φ τ θ) :
                    ∃ (C_const : ), 0 < C_const μ {ω : Ω | MSE_35 (Φ.mulVec (θhat ω)) f (⨅ (θ : { θ : Fin M // support_size_35 θ k }), (1 + α) / (1 - α) * MSE_35 (Φ.mulVec θ) f + C_const * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ) * Real.log (Real.exp 1 * M)) + C_const * σ ^ 2 / (α * (1 - α) * n) * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

                    Theorem 3.5 (Lasso Oracle Inequality) from Rigollet's High-Dimensional Statistics, Section 3.1.

                    Book statement (verbatim): Under the regression model Y = Φθ* + ε with ε ~ subG_n(σ²) and the design matrix Φ satisfying the Incoherence Condition INC(k) (Definition 3.4), the Lasso estimator θ̂^L with regularization parameter 2τ = 8σ√(2 log(2M)/n) + 8σ√(2 log(1/δ)/n) satisfies: MSE(φ_{θ̂^L}) ≤ inf_{θ ∈ ℝ^M, |θ|₀ ≤ k} { (1+α)/(1-α) · MSE(φ_θ) + C σ²/(α(1-α)n) · |θ|₀ · log(eM) } + C σ²/(α(1-α)n) · log(1/δ) with probability at least 1 − δ.

                    Formalization correspondences (key definitions with line references):

                    • Lasso penalty uses L1 norm (NOT L2): The Lasso objective lassoObjective_35 (line 84) penalizes with 2 * τ * l1norm_35 θ where l1norm_35 θ = ∑ i, |θ i| (line 75), which is the ℓ₁ norm. This matches the book's Lasso definition θ̂^L = argmin |Y-Φθ|²/n + 2τ|θ|₁.

                    • Probabilistic statement: The conclusion explicitly states μ {ω | ...} ≥ ENNReal.ofReal (1 - δ) (line 1756/1763), encoding "with probability at least 1 − δ" as a measure-theoretic lower bound.

                    • Infimum over k-sparse vectors (not universal quantification): ⨅ θ : {θ : Fin M → ℝ // support_size_35 θ ≤ k} (line 1757) takes the infimum over all θ with at most k nonzero entries, matching the book's inf_{|θ|₀ ≤ k}. Here support_size_35 θ = (Finset.univ.filter (fun i => θ i ≠ 0)).card (line 71).

                    • Sub-Gaussian noise: IsSubGaussianNoise ε σ μ (line 153) encodes the book's ε ~ subG_n(σ²) via the MGF bound ∫ ω, exp(s·εᵢ(ω)) dμ ≤ exp(σ²s²/2) for each coordinate i and all s ∈ ℝ. This is the standard sub-Gaussian definition, not vacuous.

                    • INC condition: INC_condition Φ k (line 113) wraps AssumptionINC Φ k from Chapter 2, encoding the book's Definition 3.4: |ΦᵀΦ/n − I|_∞ ≤ 1/(14k).

                    • Explicit constant: C = 1024 * k is the witness for the existential constant from the book's "numerical constant C". The proof witnesses C_const = 1024 * k via ⟨1024 * ↑k, by positivity, ...⟩.

                    • Rate terms match the book: Inside the infimum: support_size_35 θ.1 * log(e·M) (the |θ|₀ · log(eM) term). Outside the infimum: log(1/δ) (the confidence term), matching the book's formula (line 2059).

                    theorem Rigollet.Chapter3.lasso_sparse_oracle_inequality {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n M : } (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ) (f : Fin n) (ε Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f i + ε ω i) (θhat : ΩFin M) (σ : ) ( : 0 < σ) (k : ) (hk : 0 < k) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoise ε σ μ) (hsubG_full : ∀ (i : Fin n), IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hINC : INC_condition Φ k) (hColNorm : ∀ (j : Fin M), i : Fin n, Φ i j ^ 2 n) (τ : ) ( : 2 * τ = 8 * σ * (2 * Real.log (2 * M) / n) + 8 * σ * (2 * Real.log (1 / δ) / n)) (hLasso : ∀ (ω : Ω) (θ : Fin M), lassoObjective_35 (Y ω) Φ τ (θhat ω) lassoObjective_35 (Y ω) Φ τ θ) :
                    ∃ (C : ), 0 < C μ {ω : Ω | MSE_35 (Φ.mulVec (θhat ω)) f (⨅ (θ : { θ : Fin M // support_size_35 θ k }), (1 + α) / (1 - α) * MSE_35 (Φ.mulVec θ) f + C * σ ^ 2 / (α * (1 - α) * n) * (support_size_35 θ) * Real.log (Real.exp 1 * M)) + C * σ ^ 2 / (α * (1 - α) * n) * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)

                    Theorem 3.5 (Bundled): Sparse oracle inequality for the Lasso under INC(k). There exists an absolute constant C > 0 (C = 1024 suffices) such that under the sub-Gaussian noise model with INC(k), the Lasso with the prescribed regularization satisfies the MSE oracle inequality with probability at least 1 − δ.