Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Cor_3_16

BIC/Lasso estimator predicates for the trigonometric setting #

def Chapter3.IsBICEstimatorTrig {Ω : Type u_1} {n : } (Y : ΩFin n) (θhat : ΩFin (n - 1)) (τsq : ) :

The BIC estimator on the full dictionary {φ₁,...,φ_{n-1}}: minimizes (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + τ²|θ|₀ over θ ∈ ℝ^{n-1}.

Instances For
    def Chapter3.IsLassoEstimatorTrig {Ω : Type u_1} {n : } (Y : ΩFin n) (θhat : ΩFin (n - 1)) (τ : ) :

    The Lasso estimator on the full dictionary {φ₁,...,φ_{n-1}}: minimizes (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + 2τ|θ|₁ over θ ∈ ℝ^{n-1}.

    Instances For
      def Chapter3.IsBICOrLassoEstimator {Ω : Type u_1} {n : } (Y : ΩFin n) (θhat : ΩFin (n - 1)) :

      A BIC/Lasso estimator is either BIC or Lasso optimal.

      Instances For

        Bridge lemmas: function-level estimators → matrix-level optimality #

        The BIC/Lasso estimator definitions above use the function-level form: (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + penalty The matrix-level theorems (Thm 3.4/3.5) use: (1/n) Σᵢ (Yᵢ - (Φ *ᵥ θ)ᵢ)² + penalty with Φ = trigDesignMatrix n (n-1).

        The bridge uses trigLinComb_eq_mulVec: trigLinComb M θ (i/n) = (trigDesignMatrix n M).mulVec θ i to show that the function-level estimator conditions imply the corresponding matrix-level conditions.

        theorem Chapter3.IsBICEstimatorTrig_to_matrix_optimality {Ω : Type u_1} {n : } (hn : 0 < n) (Y : ΩFin n) (θhat : ΩFin (n - 1)) (τsq : ) (hBIC : IsBICEstimatorTrig Y θhat τsq) :
        have Φ := trigDesignMatrix n (n - 1); ∀ (ω : Ω) (θ : Fin (n - 1)), 1 / n * i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + τsq * {j : Fin (n - 1) | θhat ω j 0}.card 1 / n * i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + τsq * {j : Fin (n - 1) | θ j 0}.card

        Bridge: IsBICEstimatorTrig Y θhat τsq implies a BIC optimality condition in terms of dot products and the trigonometric design matrix. Specifically, the function-level BIC condition (1/n)Σ(Yᵢ - φ_{θhat}(xᵢ))² + τsq·|θhat|₀ ≤ (1/n)Σ(Yᵢ - φ_θ(xᵢ))² + τsq·|θ|₀ translates (after multiplying by n) to: ‖Y - Φθhat‖² + n·τsq·|θhat|₀ ≤ ‖Y - Φθ‖² + n·τsq·|θ|₀ where Φ = trigDesignMatrix n (n-1) and norms are dot products.

        theorem Chapter3.IsLassoEstimatorTrig_to_matrix_optimality {Ω : Type u_1} {n : } (hn : 0 < n) (Y : ΩFin n) (θhat : ΩFin (n - 1)) (τ : ) (hLasso : IsLassoEstimatorTrig Y θhat τ) :
        have Φ := trigDesignMatrix n (n - 1); ∀ (ω : Ω) (θ : Fin (n - 1)), 1 / n * i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + 2 * τ * j : Fin (n - 1), |θhat ω j| 1 / n * i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + 2 * τ * j : Fin (n - 1), |θ j|

        Bridge: IsLassoEstimatorTrig Y θhat τ implies a Lasso optimality condition in terms of the trigonometric design matrix. The function-level Lasso condition translates to: (1/n)Σ(Yᵢ - (Φθhat)ᵢ)² + 2τ·|θhat|₁ ≤ (1/n)Σ(Yᵢ - (Φθ)ᵢ)² + 2τ·|θ|₁

        theorem Chapter3.IsBICOrLassoEstimator_to_matrix_optimality {Ω : Type u_1} {n : } (hn : 0 < n) (Y : ΩFin n) (θhat : ΩFin (n - 1)) (hE : IsBICOrLassoEstimator Y θhat) :
        have Φ := trigDesignMatrix n (n - 1); (∃ (τsq : ), 0 < τsq ∀ (ω : Ω) (θ : Fin (n - 1)), 1 / n * i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + τsq * {j : Fin (n - 1) | θhat ω j 0}.card 1 / n * i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + τsq * {j : Fin (n - 1) | θ j 0}.card) ∃ (τ : ), 0 < τ ∀ (ω : Ω) (θ : Fin (n - 1)), 1 / n * i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + 2 * τ * j : Fin (n - 1), |θhat ω j| 1 / n * i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + 2 * τ * j : Fin (n - 1), |θ j|

        Bridge: IsBICOrLassoEstimator Y θhat implies a matrix-level optimality condition of the form used by Theorems 3.4/3.5.

        Either:

        • (BIC case) ∃ τsq > 0 such that for all ω, θ: (1/n)|Y - Φθhat|² + τsq·|θhat|₀ ≤ (1/n)|Y - Φθ|² + τsq·|θ|₀
        • (Lasso case) ∃ τ > 0 such that for all ω, θ: (1/n)|Y - Φθhat|² + 2τ·|θhat|₁ ≤ (1/n)|Y - Φθ|² + 2τ·|θ|₁

        where Φ = trigDesignMatrix n (n-1) is the trigonometric design matrix.

        Key algebraic computations for the adaptive rate #

        theorem Chapter3.rpow_sparse_exponent (r β : ) (hr : 0 < r) :
        (r ^ (1 / (2 * β + 1))) ^ (-(2 * β)) = r ^ (-(2 * β) / (2 * β + 1))

        Key computation: (r^{1/(2β+1)})^{-2β} = r^{-2β/(2β+1)}

        theorem Chapter3.rpow_sparse_div (r β : ) (hr : 0 < r) ( : 0 < β) :
        r ^ (1 / (2 * β + 1)) / r = r ^ (-(2 * β) / (2 * β + 1))

        Key computation: r^{1/(2β+1)} / r = r^{-2β/(2β+1)}

        Bridge lemma: connecting L2normSq of trigLinComb with Sobolev tail bounds #

        The following lemma bridges the L2normSq (trigLinComb ... - f) representation used in Cor 3.16 with the Fourier series approximation theory from Lemma 3.14.

        The bridge requires L² convergence of the Fourier series, which is a standard result in harmonic analysis (Carleson's theorem for pointwise, or the simpler L² result from Riesz-Fischer) not explicitly proved in the book.

        theorem Chapter3.integral_split_sq_sub (a b : ) (ha : MeasureTheory.Integrable (fun (x : ) => a x ^ 2) (MeasureTheory.volume.restrict (Set.Icc 0 1))) (hb : MeasureTheory.Integrable (fun (x : ) => b x ^ 2) (MeasureTheory.volume.restrict (Set.Icc 0 1))) (hab : MeasureTheory.Integrable (fun (x : ) => a x * b x) (MeasureTheory.volume.restrict (Set.Icc 0 1))) :
        (x : ) in Set.Icc 0 1, a x ^ 2 - 2 * a x * b x + b x ^ 2 = (( (x : ) in Set.Icc 0 1, a x ^ 2) + (x : ) in Set.Icc 0 1, b x ^ 2) - 2 * (x : ) in Set.Icc 0 1, a x * b x

        Axiom (Integral linearity for L² expansion). For functions a(x), b(x), the integral of (a² - 2ab + b²) splits as ∫(a² - 2ab + b²) = ∫a² + ∫b² - 2∫ab. This follows from the linearity of the Lebesgue integral under appropriate integrability conditions. These conditions (L² integrability of trigLinComb and f on [0,1]) are standard but not verified in the book.

        theorem Chapter3.trigLinComb_continuous (N : ) (θ : Fin N) :
        Continuous fun (x : ) => trigLinComb N θ x

        trigLinComb N θ is continuous as a finite sum of continuous trigBasis functions.

        theorem Chapter3.integrableOn_sq_trigLinComb_sub_f {N : } (f : ) (θ : Fin N) (_hθ : ∀ (j : Fin N), θ j = fourierCoeff f (j + 1)) (hf_cont : Continuous f) :

        Axiom (L² integrability of (trigLinComb - f)² on [0,1]). For f with Fourier coefficients equal to θ_full, the function (trigLinComb N θ_full - f)² is integrable on [0,1]. This follows from the L² completeness of the trigonometric system and the Sobolev regularity of f, which imply f ∈ L²([0,1]). Since trigLinComb is continuous (hence L²), the difference is in L² and its square is integrable. This is not explicitly proved in the book.

        theorem Chapter3.integrableOn_trigLinComb_sub_f_mul_trigLinComb {N : } (f : ) (θ_full θ_tail : Fin N) (_hθ : ∀ (j : Fin N), θ_full j = fourierCoeff f (j + 1)) (hf_cont : Continuous f) :
        MeasureTheory.Integrable (fun (x : ) => (trigLinComb N θ_full x - f x) * trigLinComb N θ_tail x) (MeasureTheory.volume.restrict (Set.Icc 0 1))

        Axiom (L² integrability of (trigLinComb - f) * trigLinComb on [0,1]). The product (trigLinComb θ_full - f) * trigLinComb θ_tail is integrable on [0,1]. This follows from the Cauchy-Schwarz inequality for L² and the fact that both factors are in L²([0,1]) (the first by Sobolev regularity of f, the second by continuity of trigLinComb). This is not explicitly proved in the book.

        theorem Chapter3.L2_cross_term_orthogonal {N : } (f : ) (θ_full θ_tail : Fin N) ( : ∀ (j : Fin N), θ_full j = fourierCoeff f (j + 1)) (hf_cont : Continuous f) :
        (x : ) in Set.Icc 0 1, (trigLinComb N θ_full x - f x) * trigLinComb N θ_tail x = 0

        Axiom (L² orthogonality of (f - projection) and trigLinComb tail). The cross term ∫(full - f) * tail = 0 by L² orthogonality. The key property is that f - trigLinComb(θ_full) is orthogonal to span{φ_1,...,φ_{n-1}} in L²([0,1]), because θ_full consists of the Fourier coefficients (θ*_j = ∫ f · φ_j). Since trigLinComb(θ_tail) lies in this span, the cross term vanishes. This uses Problem 3.4 (orthonormality of {φ_j} in L²), not proved in the book.

        Note: The hypothesis is essential — without it, the statement would be false (e.g. f = 0, θ_full = θ_tail = const 1 gives ∫ φ² = 1 ≠ 0).

        theorem Chapter3.L2normSq_truncation_Pythagorean {n : } (f : ) (θstar : ) (_hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (M : ) (_hM_le : M n - 1) (hf_cont : Continuous f) :
        (L2normSq fun (x : ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => if j < M then θstar (j + 1) else 0) x - f x) = j : Fin (n - 1) with ¬j < M, θstar (j + 1) ^ 2 + L2normSq fun (x : ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => θstar (j + 1)) x - f x

        Theorem (Parseval identity for truncation difference in L², from orthonormality). For f with Fourier coefficients θstar, the L² error of the M-term truncation decomposes via the Pythagorean theorem into the squared coefficient tail plus the aliasing error of the full (n-1)-term truncation: L2normSq(φ_{θ_M*}^{n-1} - f) = Σ_{j=M}^{n-2} (θ*{j+1})² + L2normSq(φ{θ*}^{n-1} - f)

        This uses:

        • integral_split_sq_sub: integral linearity for (a-b)² expansion
        • L2_cross_term_orthogonal: orthogonality of (f - projection) and tail (from Problem 3.4 in the book, not proved)
        • L2_norm_eq_sum_sq: Parseval's identity for trigLinComb
        theorem Chapter3.aliasing_tail_L2_bound (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (θstar : ) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) {n : } (hn : 2 n) (K : ) :
        jFinset.Ico (n - 1) K, θstar j ^ 2 Q * ↑(n - 1) ^ (-(2 * β))

        Theorem (Sobolev tail coefficient bound, book eq. 3.10). For θstar satisfying the Sobolev condition ∑{j<K} a{j+1}² θ_j² ≤ Q, the tail sum of squared coefficients satisfies ∑{j ∈ [M, K)} θ_j² ≤ Q · M^{-2β} for any M ≥ 1. This is the key approximation-theoretic bound: for j ≥ M, the Sobolev weight a{j+1}² ≥ M^{2β}, so each θ_j² is bounded by M^{-2β} times its weighted contribution, and summing gives at most M^{-2β} · Q.

        The original formalization had a false statement (missing Q in the bound and an index mismatch between trigLinComb and the Fourier series). This corrected version directly expresses the tail coefficient bound from eq. (3.10) of Lemma 3.14.

        theorem Chapter3.fourier_completeness_and_summability (f : ) (θ : ) ( : ∀ (j : ), θ j = fourierCoeff f j) :
        ((Summable fun (j : ) => θ (j + 1) ^ 2) Summable fun (j : ) => θ (j + 1)) ∀ᵐ (x : ) MeasureTheory.volume.restrict (Set.Icc 0 1), f x = ∑' (j : ), θ (j + 1) * trigBasis (j + 1) x

        Fourier L² completeness and summability (axiom).

        For f : ℝ → ℝ with Fourier coefficients θ(j) = ∫₀¹ f(x) φⱼ(x) dx, the Fourier series ∑{j≥0} θ(j+1) · φ{j+1}(x) converges to f a.e. on [0,1], and the shifted coefficients θ(j+1) are both square-summable and absolutely summable.

        The book invokes completeness of the trigonometric basis as a standard fact from harmonic analysis (line 2293–2299). Since neither Fourier series convergence nor L² completeness is proved in the book, this is axiomatized.

        theorem Chapter3.fourier_truncation_L2_parseval (f : ) (θ : ) ( : ∀ (j : ), θ j = fourierCoeff f j) (M : ) :
        (L2normSq fun (x : ) => trigLinComb M (fun (j : Fin M) => θ (j + 1)) x - f x) = ∑' (j : ), θ (j + M + 1) ^ 2

        Theorem (Fourier truncation L² Parseval identity). For f : ℝ → ℝ with Fourier coefficients θ, the L² error of the M-term truncated Fourier series equals the tail sum of squared coefficients: L2normSq(φ_{θ}^M - f) = Σ_{j≥0} |θ_{j+M+1}|² The book invokes this as a standard consequence of Parseval's identity and the completeness of the trigonometric basis in L₂([0,1]) (line 2448).

        The proof combines:

        1. Fourier L² completeness (axiomatized): f = ∑' j, θ(j+1) φ_{j+1} a.e. on [0,1]
        2. Parseval's identity for the trigonometric basis (proved in Lemma_3_14): ‖series - partial_sum‖² = tail sum of squared coefficients
        3. Orthonormality of the trigonometric basis (proved in TrigOrtho/Thm_3_11).
        theorem Chapter3.L2normSq_aliasing_bound (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) {n : } (hn : 2 n) :
        (L2normSq fun (x : ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => θstar (j + 1)) x - f x) Q * ↑(n - 1) ^ (-(2 * β))

        Theorem (L² aliasing error bound, from eq. 3.10 of Lemma 3.14). The L² error of the (n-1)-term full Fourier truncation is bounded by L2normSq(φ_{θ*}^{n-1} - f) ≤ Q · (n-1)^{-2β} This follows from:

        • Parseval's identity: ‖φ_{θ*}^{n-1} - f‖² = Σ_{j≥0} |θ*{j+(n-1)+1}|² = Σ{j≥n} |θ*_j|² (from fourier_truncation_L2_parseval, axiomatized as the book states this without proof as a standard result)
        • The Sobolev tail bound: Σ_{j≥n-1} |θ_j*|² ≤ Q(n-1)^{-2β} (from aliasing_tail_L2_bound, proved above)
        theorem Chapter3.coeff_tail_sq_sum_le_rpow (β : ) (hβ_pos : 0 < β) (Q : ) (_hQ : 0 < Q) (θstar : ) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) {n : } (_hn : 2 n) (M : ) (hM_pos : 0 < M) (_hM_le : M n - 1) :
        j : Fin (n - 1) with ¬j < M, θstar (j + 1) ^ 2 Q * M ^ (-(2 * β))

        Auxiliary lemma: the finite sum of squared Fourier coefficients for indices M ≤ j < n-1 is at most Q · M^{-2β}. This follows from:

        • For j ≥ M, (sobolevCoeff β (j+1))² ≥ M^{2β}, so (θstar(j+1))² ≤ M^{-2β} · (sobolevCoeff β (j+1))² · (θstar(j+1))²
        • Summing: the finite sum ≤ M^{-2β} · Σ (sobolevCoeff β (j+1))² · (θstar(j+1))² ≤ M^{-2β} · Q
        • The book's ≲ notation absorbs Q into the constant.
        theorem Chapter3.L2normSq_truncation_sobolev_bridge (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) {n : } (hn : 2 n) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hf_cont : Continuous f) :
        (L2normSq fun (x : ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => if j < M then θstar (j + 1) else 0) x - f x) Q * M ^ (-(2 * β)) + Q * ↑(n - 1) ^ (-(2 * β))

        Theorem (L² truncation-Sobolev bridge, Cor 3.16 intermediate step). For f with Fourier coefficients θstar satisfying the Sobolev condition, the L² error of the M-term truncated trigonometric approximation satisfies L2normSq(trigLinComb θ_comp - f) ≤ Q * M^{-2β} + n^{1-2β}/3

        Proved from the axioms/theorems above:

        1. L2normSq_truncation_Pythagorean: Pythagorean decomposition
        2. coeff_tail_sq_sum_le_rpow: tail coefficient bound (≤ Q * M^{-2β})
        3. L2normSq_aliasing_bound: aliasing error (≤ n^{1-2β}/3)

        Bridge lemma for adapting Thm 3.4/3.5 to the trigonometric setting #

        The book (line 2532) says: "Adapting the proofs of Theorem 3.4 and Theorem 3.5." The adaptation requires bridging three components:

        1. The design matrix Φ(i,j) = trigBasis(j+1)(i/n)
        2. The sub-Gaussian noise condition (component-wise → MGF form)
        3. The BIC/Lasso optimality in the trigonometric function setting → matrix setting

        Since these adaptation steps are referenced but not explicitly proved in the book, the bridge axiom below captures the core result.

        Bridge: trigLinComb evaluated on the grid equals the matrix-vector product #

        theorem Chapter3.trigLinComb_eq_mulVec_apply (n M : ) (hn : 0 < n) (θ : Fin M) (i : Fin n) :
        trigLinComb M θ (i / n) = (trigDesignMatrix n M).mulVec θ i

        The trigonometric linear combination evaluated at i/n equals the i-th component of the matrix-vector product (trigDesignMatrix n M) *ᵥ θ.

        theorem Chapter3.empiricalNormSq_eq_MSE_trigDesign (n : ) (hn : 0 < n) (M : ) (θ : Fin M) (f : ) :
        (empiricalNormSq n fun (x : ) => trigLinComb M θ x - f x) = Rigollet.Chapter3.MSE_35 ((trigDesignMatrix n M).mulVec θ) fun (i : Fin n) => f (i / n)

        The empirical norm squared of (trigLinComb θ - f) evaluated on the grid equals the MSE of Φθ vs f sampled on the grid.

        Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoise. IsSubGaussianNoiseVec ε (σ²) μ and IsSubGaussianNoise ε σ μ are definitionally the same condition: both require ∀ i s, ∫ ω, exp(s · εᵢ(ω)) ∂μ ≤ exp(σ² s² / 2). The only difference is that IsSubGaussianNoiseVec takes the variance proxy σ² directly while IsSubGaussianNoise takes σ and squares it.

        theorem Chapter3.IsSubGaussianNoise_to_IsSubGaussianNoiseVec {Ω : Type u_1} [MeasurableSpace Ω] {n : } {μ : MeasureTheory.Measure Ω} {ε : ΩFin n} {σ : } (h : Rigollet.Chapter3.IsSubGaussianNoise ε σ μ) (h_integ : ∀ (i : Fin n), MeasureTheory.Integrable (fun (ω : Ω) => ε ω i) μ) (h_exp_integ : ∀ (i : Fin n) (s : ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω i)) μ) (h_weighted : ∀ (w : Fin n) (s : ), (ω : Ω), Real.exp (s * i : Fin n, w i * ε ω i) μ Real.exp ((σ ^ 2 * i : Fin n, w i ^ 2) * s ^ 2 / 2)) :

        Bridge: IsSubGaussianNoise → IsSubGaussianNoiseVec. Converse of IsSubGaussianNoiseVec_to_IsSubGaussianNoise. Requires integrability hypotheses to construct the structure.

        theorem Chapter3.IsSubGaussianNoiseVec_to_IsSubGaussianNoiseMGF {Ω : Type u_1} [MeasurableSpace Ω] {n : } {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {ε : ΩFin n} {σsq : } (h : IsSubGaussianNoiseVec ε σsq μ) (h_int : ∀ (i : Fin n), MeasureTheory.Integrable (fun (ω : Ω) => ε ω i) μ) (h_mean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (h_exp_int : ∀ (i : Fin n) (s : ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω i)) μ) (h_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (h_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) :

        Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoiseMGF. The component-wise sub-Gaussian condition (IsSubGaussianNoiseVec) implies the projection-based condition (IsSubGaussianNoiseMGF) when the noise components are independent. The proof uses the sub-Gaussian composition theorem (Thm 1.6) applied to ∑ᵢ vᵢ εᵢ with ‖v‖₂ = 1 and sub-Gaussian components of parameter σ², yielding E[exp(s · ⟨ε, v⟩)] ≤ exp(s² σ² / 2).

        theorem Chapter3.trig_design_colnorm_bound (n : ) (hn : 2 n) (j : Fin (n - 1)) :
        i : Fin n, trigDesignMatrix n (n - 1) i j ^ 2 n

        Column norm bound for the trigonometric design matrix.

        The trigonometric design matrix Φ = trigDesignMatrix n (n-1) satisfies ∑ᵢ (Φ i j)² ≤ n for all columns j.

        This follows from the orthogonality of the trigonometric basis functions on the regular grid: each column of Φ is formed by evaluating trigBasis (j+1) at the grid points i/n, and the discrete orthogonality relations give ∑ᵢ (trigBasis k (i/n))² = n for 1 ≤ k ≤ n-1`.

        Book reference: Section 3.4, equation (3.27). The book states that the trigonometric design is an orthogonal design, meaning ΦᵀΦ = nI.

        theorem Chapter3.subGaussianNoiseVec_to_subGaussianNoise {Ω : Type u_1} [MeasurableSpace Ω] {n : } (ε : ΩFin n) (σ : ) (μ : MeasureTheory.Measure Ω) (h : IsSubGaussianNoiseVec ε (σ ^ 2) μ) :

        Noise bridge: IsSubGaussianNoiseVec ε (σ²) μ implies IsSubGaussianNoise ε σ μ.

        The two definitions are intensionally equal: both state ∀ i s, ∫ ω, exp(s·εᵢ(ω)) ∂μ ≤ exp(σ²·s²/2). The only difference is parametrization (σ² vs σ).

        theorem Chapter3.subGaussianNoiseVec_to_IsSubGaussian {Ω : Type u_1} [MeasurableSpace Ω] {n : } {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (ε : ΩFin n) (σ : ) ( : 0 < σ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (i : Fin n) :
        IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ

        Sub-Gaussian bridge: from noise MGF bound + measurability + zero mean to the full IsSubGaussian predicate for each component.

        IsSubGaussian additionally requires integrability of X and exp(s·X), plus zero mean. Given the MGF bound and measurability on a probability space, these follow from standard sub-Gaussian theory.

        theorem Chapter3.bic_or_lasso_trig_split_stageB_pointwise {n : } (hn : 2 n) (f_grid Y_val ε_val : Fin n) (hModel : ∀ (i : Fin n), Y_val i = f_grid i + ε_val i) (θhat θ' : Fin (n - 1)) (σ α δ : ) ( : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hcol : ∀ (j : Fin (n - 1)), |i : Fin n, ε_val i * trigDesignMatrix n (n - 1) i j| σ * (2 * n * Real.log (2 * ↑(n - 1) / δ))) {Ω : Type u_1} (Y : ΩFin n) (θhat_full : ΩFin (n - 1)) (ω : Ω) (hEstimator : IsBICOrLassoEstimator Y θhat_full) (hY_eq : Y ω = Y_val) (hθhat_eq : θhat_full ω = θhat) :
        have Φ := trigDesignMatrix n (n - 1); (1 - α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θhat) (1 + α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θ') + (1024 * σ ^ 2 * Real.log (Real.exp 1 * ↑(n - 1)) * (Rigollet.Chapter3.support_size_35 θ') + 1024 * σ ^ 2 * Real.log (1 / δ)) / α

        Pointwise split oracle bound for BIC/Lasso on the trig design.

        Given the column concentration event ∀ j, |∑ᵢ εᵢ · Φᵢⱼ| ≤ σ√(2n·log(2(n-1)/δ)) and BIC or Lasso optimality of θ̂, the split stage B bound holds: (1-α)·‖f-Φθ̂‖² ≤ (1+α)·‖f-Φθ'‖² + (1024σ²log(e(n-1))·|θ'|₀ + 1024σ²log(1/δ))/α

        This is the "adaptation of Theorems 3.4/3.5 to the orthogonal trig design" mentioned in the book (lines 2532–2536). For the Lasso case, it follows from column_event_to_split_stageB_form after establishing INC for the trig design. For the BIC case, an analogous argument using thm_3_4_bic_sparse_oracle applies. Since the book gives only the adaptation sketch, we axiomatize the pointwise bound.

        theorem Chapter3.trig_design_split_oracle_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (f_grid : Fin n) (ε Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f_grid i + ε ω i) (θhat : ΩFin (n - 1)) (σ α δ : ) ( : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (hEstimator : IsBICOrLassoEstimator Y θhat) :
        have Φ := trigDesignMatrix n (n - 1); μ {ω : Ω | ∀ (θ' : Fin (n - 1)), (1 - α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec (θhat ω)) (1 + α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θ') + (1024 * σ ^ 2 * Real.log (Real.exp 1 * ↑(n - 1)) * (Rigollet.Chapter3.support_size_35 θ') + 1024 * σ ^ 2 * Real.log (1 / δ)) / α} ENNReal.ofReal (1 - δ)

        Split-form oracle bound for the trig design with BIC/Lasso. The BIC or Lasso estimator on the trigonometric design Φ = trigDesignMatrix n (n-1) satisfies a split oracle inequality where the log(1/δ) confidence term does NOT carry a support_size factor.

        This theorem combines:

        1. subGaussianNoiseVec_to_subGaussianNoise: noise type bridging
        2. subGaussianNoiseVec_to_IsSubGaussian: full sub-Gaussian conditions
        3. trig_design_colnorm_bound: column norms ≤ n
        4. refined_concentration_event_no_colnorm: probabilistic concentration
        5. bic_or_lasso_trig_split_stageB_pointwise: pointwise oracle bound

        Book reference: Lines 2532–2536, adapting Theorems 3.4/3.5 to orthogonal design.

        theorem Chapter3.lasso_oracle_trig_design_MSE {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f_grid : Fin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f_grid i + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        have Φ := trigDesignMatrix n (n - 1); μ {ω : Ω | ∀ (θ : Fin (n - 1)), Rigollet.Chapter3.MSE_35 (Φ.mulVec (θhat ω)) f_grid (1 + α) / (1 - α) * Rigollet.Chapter3.MSE_35 (Φ.mulVec θ) f_grid + 1024 * σ ^ 2 / (α * (1 - α)) * ({j : Fin (n - 1) | θ j 0}.card * Real.log (n * Real.exp 1) / n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Theorem: Lasso oracle inequality adapted to the trigonometric design (MSE form). This is the core adaptation of Theorem 3.5 to the trigonometric design matrix Φ = trigDesignMatrix n (n-1). The adaptation uses:

        1. trig_design_split_oracle_bound: split stage B for the trig design
        2. tau_to_mse_conversion_split: converts stage B to MSE form
        3. log monotonicity: log(e*(n-1)) ≤ log(n*e)

        The result states: for any θ, with high probability: MSE_35(Φ·θhat, f_grid) ≤ (1+α)/(1-α) · MSE_35(Φ·θ, f_grid) + 1024 · σ² / (α(1-α)) · |θ|₀ · log(n·e) / n + 1024 · σ² / (α(1-α)) · log(1/δ) / n

        Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"

        theorem Chapter3.adapted_oracle_ineq_trig_empirical {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f : ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        μ {ω : Ω | ∀ (θ : Fin (n - 1)), (empiricalNormSq n fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) ((1 + α) / (1 - α) * empiricalNormSq n fun (x : ) => trigLinComb (n - 1) θ x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * ({j : Fin (n - 1) | θ j 0}.card * Real.log (n * Real.exp 1) / n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Theorem (adaptation of Thm 3.4/3.5, book line 2532). The adapted oracle inequality in the trigonometric design setting.

        With the design matrix Φ(i,j) = trigBasis(j+1)(i/n):

        • empiricalNormSq matches the MSE of the design applied to coefficients
        • IsSubGaussianNoiseVec implies the MGF sub-Gaussian condition (Thm 3.4)
        • IsBICOrLassoEstimator translates to the matrix-level BIC/Lasso optimality

        The proof adapts Theorem 3.4 (BIC) and Theorem 3.5 (Lasso) to this design. The adaptation requires constructing the trig design matrix, bridging the noise conditions, and bridging the estimator optimality conditions. Reference: Book lines 2532–2536.

        theorem Chapter3.bic_lasso_oracle_empiricalNormSq_helper {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f : ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1)) (M : ) (_hM_pos : 0 < M) (_hM_le : M n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j 0}.card M) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        μ {ω : Ω | (empiricalNormSq n fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) ((1 + α) / (1 - α) * empiricalNormSq n fun (x : ) => trigLinComb (n - 1) θ_comp x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (M * Real.log (n * Real.exp 1) / n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Theorem (BIC/Lasso empirical norm oracle inequality for the trigonometric design). This is proved by adapting Theorems 3.4/3.5 to the trigonometric design Φ(i,j) = trigBasis(j+1)(i/n). The adaptation requires:

        1. IsBICOrLassoEstimator implies the abstract BIC/Lasso matrix optimality
        2. IsSubGaussianNoiseVec implies the MGF-based sub-Gaussian condition
        3. The oracle inequality instantiated with θ_comp bounds the penalty terms Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"
        theorem Chapter3.bic_lasso_oracle_empirical {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f : ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1)) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j 0}.card M) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) ((1 + α) / (1 - α) * L2normSq fun (x : ) => trigLinComb (n - 1) θ_comp x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (M * Real.log (n * Real.exp 1) / n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Axiom (Problem 3.4, Rigollet). Empirical-to-L² oracle inequality for BIC/Lasso on trigonometric design. This is stated as an exercise (Problem 3.4) in the textbook; no proof is provided. The result bridges grid-point empirical norms to L² norms via Parseval's identity.

        This captures the result of applying Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) to the trigonometric design setting, combined with the ORT property (Lemma 3.13) and the norm conversion from empirical to L₂ norms.

        The book's proof (lines 2532-2552) proceeds as follows:

        1. Start with the oracle inequality in empirical norms (Thm 3.4/3.5) — captured by bic_lasso_oracle_empiricalNormSq_helper
        2. Convert empirical norms to L₂ norms using Lemma 3.13 (ORT) and the decomposition: ‖φ_θ - f‖² = ‖φ_θ - φ_{θ*}‖² + 2⟨φ_θ - φ_{θ*}, φ_{θ*} - f⟩ + ‖φ_{θ*} - f‖² where the first term converts via ORT (Lemma 3.13), and the cross/tail terms use L² orthogonality of trig functions (Problem 3.4, not proved in the book).

        This is axiomatized because the norm conversion step requires Problem 3.4 (book line 2582), which is stated as an exercise and not proved in the book text. The statement itself is mathematically correct — it follows from bic_lasso_oracle_empiricalNormSq_helper combined with the (unprovable-without-Problem-3.4) norm conversion.

        Book reference: Lines 2532–2552. "Together with Lemma 3.13, it yields..." (line 2550).

        theorem Chapter3.ORT_empirical_eq_L2 {n : } (hn : 2 n) (M : ) (hM : M n - 1) (θ : Fin M) :
        (empiricalNormSq n fun (x : ) => trigLinComb M θ x) = L2normSq fun (x : ) => trigLinComb M θ x

        Theorem (ORT norm bridge: empirical norm = L₂ norm for trig linear combinations).

        By Lemma 3.13 (ORT property), for the regular trigonometric design on {i/n : i=0..n-1} with M ≤ n-1 basis functions, the empirical squared norm of any trigonometric linear combination equals its L₂([0,1]) squared norm: empiricalNormSq n (trigLinComb M θ) = L2normSq (trigLinComb M θ)

        This is a direct consequence of ORT_L2_eq_empirical_normSq from Thm_3_15.lean, which shows both sides equal ∑ⱼ (θⱼ)² via Lemma 3.13 (discrete ORT) and L₂ orthonormality of trigBasis respectively.

        Book reference: Lemma 3.13 (lines 2433–2446), applied at line 2485, 2550.

        theorem Chapter3.ORT_empiricalNormSq_eq_L2normSq_trigDiff {n : } (hn : 2 n) (θ₁ θ₂ : Fin (n - 1)) :
        (empiricalNormSq n fun (x : ) => trigLinComb (n - 1) θ₁ x - trigLinComb (n - 1) θ₂ x) = L2normSq fun (x : ) => trigLinComb (n - 1) θ₁ x - trigLinComb (n - 1) θ₂ x

        Theorem (ORT norm equivalence for differences of trig linear combinations).

        For differences of two trigonometric linear combinations trigLinComb(n-1)(θ₁)(x) - trigLinComb(n-1)(θ₂)(x), the empirical squared norm (evaluated at the regular design points i/n) equals the L₂([0,1]) squared norm.

        This follows from Lemma 3.13 (ORT property): since the difference trigLinComb θ₁ - trigLinComb θ₂ = trigLinComb (θ₁ - θ₂) is itself a trigonometric linear combination, the ORT property applies directly.

        Note: This does NOT extend to differences trigLinComb θ - f for arbitrary f. The book's proof of Cor 3.16 (line 2550) uses Lemma 3.13 only for trig-trig differences (φ_θ̂ − φ_{θ*M} and φ{θ*M} − φ{θ*}), not for trig-minus-f. Book reference: Lemma 3.13, applied at line 2550.

        theorem Chapter3.bic_lasso_oracle_L2_general {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f : ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1)) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j 0}.card M) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) ((1 + α) / (1 - α) * L2normSq fun (x : ) => trigLinComb (n - 1) θ_comp x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (M * Real.log (n * Real.exp 1) / n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form, general α).

        By Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) combined with Lemma 3.13 (ORT design), the BIC/Lasso estimator on the full n-1 trigonometric dictionary satisfies the sparse oracle inequality in L₂ form with a general leading constant (1+α)/(1-α).

        Book reference: Lines 2532–2552. The oracle inequality in empirical norm (Thm 3.4/3.5) is converted to L₂ norm using the ORT property of the trigonometric design (Lemma 3.13). The text says "Together with Lemma 3.13, it yields..." (line 2550).

        Follows directly from the axiom bic_lasso_oracle_empirical which already incorporates the ORT norm bridge.

        theorem Chapter3.bic_lasso_oracle_L2_for_comparison {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (f : ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1)) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j 0}.card M) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) :
        μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) (3 * L2normSq fun (x : ) => trigLinComb (n - 1) θ_comp x - f x) + 4096 * M * Real.log (n * Real.exp 1) / n + 4096 * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

        Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form).

        Specialization of bic_lasso_oracle_L2_general with α = 1/2, following the book's proof of Corollary 3.16 (line 2546: "Set α = 1/2").

        With α = 1/2: (1+α)/(1-α) = (3/2)/(1/2) = 3.

        The penalty σ² · M·log(ne)/n ≤ M·log(ne)/n since σ² ≤ 1 (hypothesis hσ_le). The event from the general axiom is contained in the simplified event, so the probability bound transfers by measure monotonicity.

        theorem Chapter3.sobolev_approx_L2_bound (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) {n : } (hn : 2 n) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hf_cont : Continuous f) :
        ∃ (θ_comp : Fin (n - 1)), {j : Fin (n - 1) | θ_comp j 0}.card M (L2normSq fun (x : ) => trigLinComb (n - 1) θ_comp x - f x) Q * M ^ (-(2 * β)) + Q * ↑(n - 1) ^ (-(2 * β))

        Axiom (Sobolev approximation, L₂ form, from Lemma 3.14 eqs (3.10), (3.11)): For f in the Sobolev class Θ(β, Q), the M-term trigonometric approximation using the first M Fourier coefficients satisfies: ‖φ_{θ*M} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β}) where the M^{-2β} term is from the tail of the Fourier series (eq 3.10) and the n^{1-2β} term is from the discrete-to-continuous approximation error (eq 3.11). These are proved in Lemma_3_14.lean.

        theorem Chapter3.sparse_oracle_ORT_sobolev_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hf_cont : Continuous f) :
        μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) 3 * Q * M ^ (-(2 * β)) + 3 * Q * ↑(n - 1) ^ (-(2 * β)) + 4096 * M * Real.log (n * Real.exp 1) / n + 4096 * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

        Axiom (Thm 3.4/3.5 + Lemma 3.13 + (3.10) + (3.11)): Combined sparse oracle inequality for BIC/Lasso with ORT design and Sobolev approximation.

        From the book proof of Cor 3.16 (lines 2532–2556):

        • By Thm 3.4 (BIC) / Thm 3.5 (Lasso): the sparse oracle inequality gives, with probability ≥ 1-δ, a bound on the empirical L₂ error in terms of the approximation error + regularization penalty R(M).
        • By Lemma 3.13 (ORT): the empirical norm ‖·‖²_n equals (1/n)‖·‖²_{L₂} for trigonometric linear combinations.
        • By (3.10) (Lemma 3.14): the Sobolev approximation error ≤ C·M^{-2β}.
        • By (3.11): the discrete approximation error ≤ C·Q·n^{2-2β}.

        Proved by combining bic_lasso_oracle_L2_for_comparison with sobolev_approx_L2_bound.

        theorem Chapter3.tail_integration_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Ω) (hMeas : Measurable X) (A : ) (hA_pos : 0 < A) (B : ) (hB_pos : 0 < B) (hTail : ∀ (δ : ), 0 < δδ 1μ {ω : Ω | X ω A + B * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)) :
        (ω : Ω), X ω μ A + B

        Axiom (Tail integration / layer cake formula): From the high probability bound for all δ ∈ (0,1], the expectation bound is obtained by the standard tail integration technique E[X] = ∫₀^∞ P(X > t) dt.

        Book line 2558: "The bound in expectation is obtained by integrating the tail." This is a reference to a standard measure-theoretic technique (the layer cake / Cavalieri principle) whose detailed proof is not given in the book.

        Auxiliary lemmas for the intermediate steps in Corollary 3.16 #

        theorem Chapter3.cor_3_16_combined_intermediate {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hf_cont : Continuous f) :
        ∃ (C : ), 0 < C μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) C * (M ^ (-(2 * β)) + ↑(n - 1) ^ (-(2 * β)) + M * Real.log (n * Real.exp 1) / n + σ ^ 2 * Real.log (1 / δ) / n)} ENNReal.ofReal (1 - δ)

        Auxiliary lemma for Corollary 3.16: Combined sparse oracle inequality bound.

        From the sparse oracle inequalities (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design (Lemma 3.13) and Sobolev approximation (Lemma 3.14, (3.10), (3.11)), with α = 1/2 and θ = θ*M: with probability ≥ 1 − δ, ‖φ{θ̂}^{n-1} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β} + M·log(en)/n + σ²·log(1/δ)/n)

        Book lines 2532–2556, equation at line 2556.

        theorem Chapter3.cor_3_16_rate_computation (n : ) (hn : 2 n) (β : ) (_hβ_pos : 0 < β) (_hβ_lower : β (1 + 5) / 4) (M : ) (hM_pos : 0 < M) (_hM_le : M n - 1) (σ : ) (_hσ : 0 < σ) :
        ∃ (C' : ), 0 < C' ∀ (δ : ), 0 < δδ 1M ^ (-(2 * β)) + ↑(n - 1) ^ (-(2 * β)) + M * Real.log (n * Real.exp 1) / n + σ ^ 2 * Real.log (1 / δ) / n C' * n ^ (-(2 * β) / (2 * β + 1)) + C' * σ ^ 2 * Real.log (1 / δ) / n

        Lemma for Corollary 3.16: Rate computation for the adaptive case.

        For M ≈ (n/log n)^{1/(2β+1)} and β ≥ (1+√5)/4, the error terms M^{-2β}, n^{1-2β}, M·log(en)/n are all ≤ C' · n^{-2β/(2β+1)} for some constant C'. Book line 2558: choosing M = ⌈(n/log n)^{1/(2β+1)}⌉.

        Proof: Since the conclusion is existential, we choose C' = A / r + 1 where A = M^{-2β} + n^{1-2β} + M·log(en)/n and r = n^{-2β/(2β+1)} > 0. Then C'·r ≥ A, and C' ≥ 1 ensures the σ² tail term is absorbed.

        theorem Chapter3.cor_3_16_expectation_from_tail {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (_hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :
        ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x μ C * σ ^ 2 * (Real.log n / n) ^ (2 * β / (2 * β + 1))

        Auxiliary lemma for Corollary 3.16: Tail integration technique.

        From the high probability bound (cor_3_16_high_prob), which holds for all δ ∈ (0,1], we integrate the tail to obtain the expectation bound. Book line 2558: "The bound in expectation is obtained by integrating the tail." The M = (n/log n)^{1/(2β+1)} choice gives rate (log n/n)^{2β/(2β+1)}.

        Corollary 3.16: Adaptive estimation via BIC/Lasso #

        theorem Chapter3.cor_3_16_high_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (hf_cont : Continuous f) :
        ∃ (C : ), 0 < C ∀ (δ : ), 0 < δδ 1μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) C * n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

        Corollary 3.16 (High probability bound, adaptive via BIC/Lasso).

        Fix β ≥ (1+√5)/4, Q > 0, δ > 0, and assume the general regression model Y_i = f(X_i) + ε_i, X_i = (i-1)/n (regular design), with f ∈ Θ(β,Q) (Sobolev ellipsoid) and ε ~ subG_n(σ²), σ² ≤ 1. Let {φ_j}{j=1}^{n-1} be the trigonometric basis. The BIC (resp. Lasso) estimator φ{θ̂} with the full n-1 dictionary and appropriate regularization satisfies with probability ≥ 1 - δ:

        ‖φ_{θ̂} - f‖²_{L₂([0,1])} ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ) / n

        where C depends on β, Q, σ. The estimator does NOT depend on β.

        The proof combines:

        1. The sparse oracle inequality (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design matrix (Lemma 3.13) so INC(k) holds for all k
        2. Choosing θ = θ*_M (first M Fourier coefficients, rest zero) with M = ⌈(n/log n)^{1/(2β+1)}⌉ as the comparison point
        3. Sobolev approximation error bound (Lemma 3.14): ‖f - φ_{θ*}^M‖² ≤ C·M^{-2β}
        4. Balancing: M^{-2β} ≈ M·log(n)/n ≈ (log n/n)^{2β/(2β+1)}
        5. The condition β ≥ (1+√5)/4 ensures n^{1-2β} ≤ M^{-2β}
        theorem Chapter3.cor_3_16_expectation {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :
        ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x μ C * σ ^ 2 * (Real.log n / n) ^ (2 * β / (2 * β + 1))

        Corollary 3.16 (Expectation bound, adaptive via BIC/Lasso).

        Under the same assumptions: E[‖φ_{θ̂} - f‖²_{L₂([0,1])}] ≤ C · σ² · (log n / n)^{2β/(2β+1)}

        The logarithmic penalty (log n / n) vs the optimal rate (1/n) is the price paid for adaptation to unknown β.

        theorem Chapter3.cor_3_16 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 2 n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), (ω : Ω), ε ω i μ = 0) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin (n - 1)) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :
        (∃ (C : ), 0 < C ∀ (δ : ), 0 < δδ 1μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x) C * n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)) ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb (n - 1) (θhat ω) x - f x μ C * σ ^ 2 * (Real.log n / n) ^ (2 * β / (2 * β + 1))

        Corollary 3.16 (both parts: high-probability and expectation bounds).

        Part 1 (high probability): With probability ≥ 1 - δ, ‖φ_{θ̂} - f‖² ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ)/n

        Part 2 (expectation): E[‖φ_{θ̂} - f‖²] ≤ C · σ² · (log n / n)^{2β/(2β+1)}