Atlas.HighDimensionalStatistics.code.Chapter3.Cor_3

BIC/Lasso estimator predicates for the trigonometric setting #

def Chapter3.IsBICEstimatorTrig {Ω : Type u_1} {n : ℕ} (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) (τsq : ℝ) :

The BIC estimator on the full dictionary {φ₁,...,φ_{n-1}}: minimizes (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + τ²|θ|₀ over θ ∈ ℝ^{n-1}.

Instances For

source

def Chapter3.IsLassoEstimatorTrig {Ω : Type u_1} {n : ℕ} (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) (τ : ℝ) :

Prop

The Lasso estimator on the full dictionary {φ₁,...,φ_{n-1}}: minimizes (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + 2τ|θ|₁ over θ ∈ ℝ^{n-1}.

Instances For

source

def Chapter3.IsBICOrLassoEstimator {Ω : Type u_1} {n : ℕ} (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) :

Prop

A BIC/Lasso estimator is either BIC or Lasso optimal.

Instances For

Bridge lemmas: function-level estimators → matrix-level optimality #

The BIC/Lasso estimator definitions above use the function-level form: (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + penalty The matrix-level theorems (Thm 3.4/3.5) use: (1/n) Σᵢ (Yᵢ - (Φ *ᵥ θ)ᵢ)² + penalty with Φ = trigDesignMatrix n (n-1).

The bridge uses trigLinComb_eq_mulVec: trigLinComb M θ (i/n) = (trigDesignMatrix n M).mulVec θ i to show that the function-level estimator conditions imply the corresponding matrix-level conditions.

source

theorem Chapter3.IsBICEstimatorTrig_to_matrix_optimality {Ω : Type u_1} {n : ℕ} (hn : 0 < n) (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) (τsq : ℝ) (hBIC : IsBICEstimatorTrig Y θhat τsq) :

have Φ := trigDesignMatrix n (n - 1); ∀ (ω : Ω) (θ : Fin (n - 1) → ℝ), 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + τsq * ↑{j : Fin (n - 1) | θhat ω j ≠ 0}.card ≤ 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + τsq * ↑{j : Fin (n - 1) | θ j ≠ 0}.card

Bridge: IsBICEstimatorTrig Y θhat τsq implies a BIC optimality condition in terms of dot products and the trigonometric design matrix. Specifically, the function-level BIC condition (1/n)Σ(Yᵢ - φ_{θhat}(xᵢ))² + τsq·|θhat|₀ ≤ (1/n)Σ(Yᵢ - φ_θ(xᵢ))² + τsq·|θ|₀ translates (after multiplying by n) to: ‖Y - Φθhat‖² + n·τsq·|θhat|₀ ≤ ‖Y - Φθ‖² + n·τsq·|θ|₀ where Φ = trigDesignMatrix n (n-1) and norms are dot products.

source

theorem Chapter3.IsLassoEstimatorTrig_to_matrix_optimality {Ω : Type u_1} {n : ℕ} (hn : 0 < n) (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) (τ : ℝ) (hLasso : IsLassoEstimatorTrig Y θhat τ) :

have Φ := trigDesignMatrix n (n - 1); ∀ (ω : Ω) (θ : Fin (n - 1) → ℝ), 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + 2 * τ * ∑ j : Fin (n - 1), |θhat ω j| ≤ 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + 2 * τ * ∑ j : Fin (n - 1), |θ j|

Bridge: IsLassoEstimatorTrig Y θhat τ implies a Lasso optimality condition in terms of the trigonometric design matrix. The function-level Lasso condition translates to: (1/n)Σ(Yᵢ - (Φθhat)ᵢ)² + 2τ·|θhat|₁ ≤ (1/n)Σ(Yᵢ - (Φθ)ᵢ)² + 2τ·|θ|₁

source

theorem Chapter3.IsBICOrLassoEstimator_to_matrix_optimality {Ω : Type u_1} {n : ℕ} (hn : 0 < n) (Y : Ω → Fin n → ℝ) (θhat : Ω → Fin (n - 1) → ℝ) (hE : IsBICOrLassoEstimator Y θhat) :

have Φ := trigDesignMatrix n (n - 1); (∃ (τsq : ℝ), 0 < τsq ∧ ∀ (ω : Ω) (θ : Fin (n - 1) → ℝ), 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + τsq * ↑{j : Fin (n - 1) | θhat ω j ≠ 0}.card ≤ 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + τsq * ↑{j : Fin (n - 1) | θ j ≠ 0}.card) ∨ ∃ (τ : ℝ), 0 < τ ∧ ∀ (ω : Ω) (θ : Fin (n - 1) → ℝ), 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec (θhat ω) i) ^ 2 + 2 * τ * ∑ j : Fin (n - 1), |θhat ω j| ≤ 1 / ↑n * ∑ i : Fin n, (Y ω i - Φ.mulVec θ i) ^ 2 + 2 * τ * ∑ j : Fin (n - 1), |θ j|

Bridge: IsBICOrLassoEstimator Y θhat implies a matrix-level optimality condition of the form used by Theorems 3.4/3.5.

Either:

(BIC case) ∃ τsq > 0 such that for all ω, θ: (1/n)|Y - Φθhat|² + τsq·|θhat|₀ ≤ (1/n)|Y - Φθ|² + τsq·|θ|₀
(Lasso case) ∃ τ > 0 such that for all ω, θ: (1/n)|Y - Φθhat|² + 2τ·|θhat|₁ ≤ (1/n)|Y - Φθ|² + 2τ·|θ|₁

where Φ = trigDesignMatrix n (n-1) is the trigonometric design matrix.

Key algebraic computations for the adaptive rate #

source

theorem Chapter3.rpow_sparse_exponent (r β : ℝ) (hr : 0 < r) :

(r ^ (1 / (2 * β + 1))) ^ (-(2 * β)) = r ^ (-(2 * β) / (2 * β + 1))

Key computation: (r^{1/(2β+1)})^{-2β} = r^{-2β/(2β+1)}

source

theorem Chapter3.rpow_sparse_div (r β : ℝ) (hr : 0 < r) (hβ : 0 < β) :

r ^ (1 / (2 * β + 1)) / r = r ^ (-(2 * β) / (2 * β + 1))

Key computation: r^{1/(2β+1)} / r = r^{-2β/(2β+1)}

Bridge lemma: connecting L2normSq of trigLinComb with Sobolev tail bounds #

The following lemma bridges the L2normSq (trigLinComb ... - f) representation used in Cor 3.16 with the Fourier series approximation theory from Lemma 3.14.

The bridge requires L² convergence of the Fourier series, which is a standard result in harmonic analysis (Carleson's theorem for pointwise, or the simpler L² result from Riesz-Fischer) not explicitly proved in the book.

source

theorem Chapter3.integral_split_sq_sub (a b : ℝ → ℝ) (ha : MeasureTheory.Integrable (fun (x : ℝ) => a x ^ 2) (MeasureTheory.volume.restrict (Set.Icc 0 1))) (hb : MeasureTheory.Integrable (fun (x : ℝ) => b x ^ 2) (MeasureTheory.volume.restrict (Set.Icc 0 1))) (hab : MeasureTheory.Integrable (fun (x : ℝ) => a x * b x) (MeasureTheory.volume.restrict (Set.Icc 0 1))) :

∫ (x : ℝ) in Set.Icc 0 1, a x ^ 2 - 2 * a x * b x + b x ^ 2 = ((∫ (x : ℝ) in Set.Icc 0 1, a x ^ 2) + ∫ (x : ℝ) in Set.Icc 0 1, b x ^ 2) - 2 * ∫ (x : ℝ) in Set.Icc 0 1, a x * b x

Axiom (Integral linearity for L² expansion). For functions a(x), b(x), the integral of (a² - 2ab + b²) splits as ∫(a² - 2ab + b²) = ∫a² + ∫b² - 2∫ab. This follows from the linearity of the Lebesgue integral under appropriate integrability conditions. These conditions (L² integrability of trigLinComb and f on [0,1]) are standard but not verified in the book.

source

theorem Chapter3.trigLinComb_continuous (N : ℕ) (θ : Fin N → ℝ) :

Continuous fun (x : ℝ) => trigLinComb N θ x

trigLinComb N θ is continuous as a finite sum of continuous trigBasis functions.

source

theorem Chapter3.integrableOn_sq_trigLinComb_sub_f {N : ℕ} (f : ℝ → ℝ) (θ : Fin N → ℝ) (_hθ : ∀ (j : Fin N), θ j = fourierCoeff f (↑j + 1)) (hf_cont : Continuous f) :

MeasureTheory.Integrable (fun (x : ℝ) => (trigLinComb N θ x - f x) ^ 2) (MeasureTheory.volume.restrict (Set.Icc 0 1))

Axiom (L² integrability of (trigLinComb - f)² on [0,1]). For f with Fourier coefficients equal to θ_full, the function (trigLinComb N θ_full - f)² is integrable on [0,1]. This follows from the L² completeness of the trigonometric system and the Sobolev regularity of f, which imply f ∈ L²([0,1]). Since trigLinComb is continuous (hence L²), the difference is in L² and its square is integrable. This is not explicitly proved in the book.

source

theorem Chapter3.integrableOn_trigLinComb_sub_f_mul_trigLinComb {N : ℕ} (f : ℝ → ℝ) (θ_full θ_tail : Fin N → ℝ) (_hθ : ∀ (j : Fin N), θ_full j = fourierCoeff f (↑j + 1)) (hf_cont : Continuous f) :

MeasureTheory.Integrable (fun (x : ℝ) => (trigLinComb N θ_full x - f x) * trigLinComb N θ_tail x) (MeasureTheory.volume.restrict (Set.Icc 0 1))

Axiom (L² integrability of (trigLinComb - f) * trigLinComb on [0,1]). The product (trigLinComb θ_full - f) * trigLinComb θ_tail is integrable on [0,1]. This follows from the Cauchy-Schwarz inequality for L² and the fact that both factors are in L²([0,1]) (the first by Sobolev regularity of f, the second by continuity of trigLinComb). This is not explicitly proved in the book.

source

theorem Chapter3.L2_cross_term_orthogonal {N : ℕ} (f : ℝ → ℝ) (θ_full θ_tail : Fin N → ℝ) (hθ : ∀ (j : Fin N), θ_full j = fourierCoeff f (↑j + 1)) (hf_cont : Continuous f) :

∫ (x : ℝ) in Set.Icc 0 1, (trigLinComb N θ_full x - f x) * trigLinComb N θ_tail x = 0

Axiom (L² orthogonality of (f - projection) and trigLinComb tail). The cross term ∫(full - f) * tail = 0 by L² orthogonality. The key property is that f - trigLinComb(θ_full) is orthogonal to span{φ_1,...,φ_{n-1}} in L²([0,1]), because θ_full consists of the Fourier coefficients (θ*_j = ∫ f · φ_j). Since trigLinComb(θ_tail) lies in this span, the cross term vanishes. This uses Problem 3.4 (orthonormality of {φ_j} in L²), not proved in the book.

Note: The hypothesis hθ is essential — without it, the statement would be false (e.g. f = 0, θ_full = θ_tail = const 1 gives ∫ φ² = 1 ≠ 0).

source

theorem Chapter3.L2normSq_truncation_Pythagorean {n : ℕ} (f : ℝ → ℝ) (θstar : ℕ → ℝ) (_hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (M : ℕ) (_hM_le : M ≤ n - 1) (hf_cont : Continuous f) :

(L2normSq fun (x : ℝ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => if ↑j < M then θstar (↑j + 1) else 0) x - f x) = ∑ j : Fin (n - 1) with ¬↑j < M, θstar (↑j + 1) ^ 2 + L2normSq fun (x : ℝ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => θstar (↑j + 1)) x - f x

Theorem (Parseval identity for truncation difference in L², from orthonormality). For f with Fourier coefficients θstar, the L² error of the M-term truncation decomposes via the Pythagorean theorem into the squared coefficient tail plus the aliasing error of the full (n-1)-term truncation: L2normSq(φ_{θ_M*}^{n-1} - f) = Σ_{j=M}^{n-2} (θ*{j+1})² + L2normSq(φ{θ*}^{n-1} - f)

This uses:

integral_split_sq_sub: integral linearity for (a-b)² expansion
L2_cross_term_orthogonal: orthogonality of (f - projection) and tail (from Problem 3.4 in the book, not proved)
L2_norm_eq_sum_sq: Parseval's identity for trigLinComb

source

theorem Chapter3.aliasing_tail_L2_bound (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (θstar : ℕ → ℝ) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) {n : ℕ} (hn : 2 ≤ n) (K : ℕ) :

∑ j ∈ Finset.Ico (n - 1) K, θstar j ^ 2 ≤ Q * ↑(n - 1) ^ (-(2 * β))

Theorem (Sobolev tail coefficient bound, book eq. 3.10). For θstar satisfying the Sobolev condition ∑{j<K} a{j+1}² θ_j² ≤ Q, the tail sum of squared coefficients satisfies ∑{j ∈ [M, K)} θ_j² ≤ Q · M^{-2β} for any M ≥ 1. This is the key approximation-theoretic bound: for j ≥ M, the Sobolev weight a{j+1}² ≥ M^{2β}, so each θ_j² is bounded by M^{-2β} times its weighted contribution, and summing gives at most M^{-2β} · Q.

The original formalization had a false statement (missing Q in the bound and an index mismatch between trigLinComb and the Fourier series). This corrected version directly expresses the tail coefficient bound from eq. (3.10) of Lemma 3.14.

source

theorem Chapter3.fourier_completeness_and_summability (f : ℝ → ℝ) (θ : ℕ → ℝ) (hθ : ∀ (j : ℕ), θ j = fourierCoeff f j) :

((Summable fun (j : ℕ) => θ (j + 1) ^ 2) ∧ Summable fun (j : ℕ) => θ (j + 1)) ∧ ∀ᵐ (x : ℝ) ∂MeasureTheory.volume.restrict (Set.Icc 0 1), f x = ∑' (j : ℕ), θ (j + 1) * trigBasis (j + 1) x

Fourier L² completeness and summability (axiom).

For f : ℝ → ℝ with Fourier coefficients θ(j) = ∫₀¹ f(x) φⱼ(x) dx, the Fourier series ∑{j≥0} θ(j+1) · φ{j+1}(x) converges to f a.e. on [0,1], and the shifted coefficients θ(j+1) are both square-summable and absolutely summable.

The book invokes completeness of the trigonometric basis as a standard fact from harmonic analysis (line 2293–2299). Since neither Fourier series convergence nor L² completeness is proved in the book, this is axiomatized.

source

theorem Chapter3.fourier_truncation_L2_parseval (f : ℝ → ℝ) (θ : ℕ → ℝ) (hθ : ∀ (j : ℕ), θ j = fourierCoeff f j) (M : ℕ) :

(L2normSq fun (x : ℝ) => trigLinComb M (fun (j : Fin M) => θ (↑j + 1)) x - f x) = ∑' (j : ℕ), θ (j + M + 1) ^ 2

Theorem (Fourier truncation L² Parseval identity). For f : ℝ → ℝ with Fourier coefficients θ, the L² error of the M-term truncated Fourier series equals the tail sum of squared coefficients: L2normSq(φ_{θ}^M - f) = Σ_{j≥0} |θ_{j+M+1}|² The book invokes this as a standard consequence of Parseval's identity and the completeness of the trigonometric basis in L₂([0,1]) (line 2448).

The proof combines:

Fourier L² completeness (axiomatized): f = ∑' j, θ(j+1) φ_{j+1} a.e. on [0,1]
Parseval's identity for the trigonometric basis (proved in Lemma_3_14): ‖series - partial_sum‖² = tail sum of squared coefficients
Orthonormality of the trigonometric basis (proved in TrigOrtho/Thm_3_11).

source

theorem Chapter3.L2normSq_aliasing_bound (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) {n : ℕ} (hn : 2 ≤ n) :

(L2normSq fun (x : ℝ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => θstar (↑j + 1)) x - f x) ≤ Q * ↑(n - 1) ^ (-(2 * β))

Theorem (L² aliasing error bound, from eq. 3.10 of Lemma 3.14). The L² error of the (n-1)-term full Fourier truncation is bounded by L2normSq(φ_{θ*}^{n-1} - f) ≤ Q · (n-1)^{-2β} This follows from:

Parseval's identity: ‖φ_{θ*}^{n-1} - f‖² = Σ_{j≥0} |θ*{j+(n-1)+1}|² = Σ{j≥n} |θ*_j|² (from fourier_truncation_L2_parseval, axiomatized as the book states this without proof as a standard result)
The Sobolev tail bound: Σ_{j≥n-1} |θ_j*|² ≤ Q(n-1)^{-2β} (from aliasing_tail_L2_bound, proved above)

source

theorem Chapter3.coeff_tail_sq_sum_le_rpow (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (_hQ : 0 < Q) (θstar : ℕ → ℝ) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) {n : ℕ} (_hn : 2 ≤ n) (M : ℕ) (hM_pos : 0 < M) (_hM_le : M ≤ n - 1) :

∑ j : Fin (n - 1) with ¬↑j < M, θstar (↑j + 1) ^ 2 ≤ Q * ↑M ^ (-(2 * β))

Auxiliary lemma: the finite sum of squared Fourier coefficients for indices M ≤ j < n-1 is at most Q · M^{-2β}. This follows from:

For j ≥ M, (sobolevCoeff β (j+1))² ≥ M^{2β}, so (θstar(j+1))² ≤ M^{-2β} · (sobolevCoeff β (j+1))² · (θstar(j+1))²
Summing: the finite sum ≤ M^{-2β} · Σ (sobolevCoeff β (j+1))² · (θstar(j+1))² ≤ M^{-2β} · Q
The book's ≲ notation absorbs Q into the constant.

source

theorem Chapter3.L2normSq_truncation_sobolev_bridge (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) {n : ℕ} (hn : 2 ≤ n) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (hf_cont : Continuous f) :

(L2normSq fun (x : ℝ) => trigLinComb (n - 1) (fun (j : Fin (n - 1)) => if ↑j < M then θstar (↑j + 1) else 0) x - f x) ≤ Q * ↑M ^ (-(2 * β)) + Q * ↑(n - 1) ^ (-(2 * β))

Theorem (L² truncation-Sobolev bridge, Cor 3.16 intermediate step). For f with Fourier coefficients θstar satisfying the Sobolev condition, the L² error of the M-term truncated trigonometric approximation satisfies L2normSq(trigLinComb θ_comp - f) ≤ Q * M^{-2β} + n^{1-2β}/3

Proved from the axioms/theorems above:

L2normSq_truncation_Pythagorean: Pythagorean decomposition
coeff_tail_sq_sum_le_rpow: tail coefficient bound (≤ Q * M^{-2β})
L2normSq_aliasing_bound: aliasing error (≤ n^{1-2β}/3)

Bridge lemma for adapting Thm 3.4/3.5 to the trigonometric setting #

The book (line 2532) says: "Adapting the proofs of Theorem 3.4 and Theorem 3.5." The adaptation requires bridging three components:

The design matrix Φ(i,j) = trigBasis(j+1)(i/n)
The sub-Gaussian noise condition (component-wise → MGF form)
The BIC/Lasso optimality in the trigonometric function setting → matrix setting

Since these adaptation steps are referenced but not explicitly proved in the book, the bridge axiom below captures the core result.

Bridge: trigLinComb evaluated on the grid equals the matrix-vector product #

source

theorem Chapter3.trigLinComb_eq_mulVec_apply (n M : ℕ) (hn : 0 < n) (θ : Fin M → ℝ) (i : Fin n) :

trigLinComb M θ (↑↑i / ↑n) = (trigDesignMatrix n M).mulVec θ i

The trigonometric linear combination evaluated at i/n equals the i-th component of the matrix-vector product (trigDesignMatrix n M) *ᵥ θ.

source

theorem Chapter3.empiricalNormSq_eq_MSE_trigDesign (n : ℕ) (hn : 0 < n) (M : ℕ) (θ : Fin M → ℝ) (f : ℝ → ℝ) :

(empiricalNormSq n fun (x : ℝ) => trigLinComb M θ x - f x) = Rigollet.Chapter3.MSE_35 ((trigDesignMatrix n M).mulVec θ) fun (i : Fin n) => f (↑↑i / ↑n)

The empirical norm squared of (trigLinComb θ - f) evaluated on the grid equals the MSE of Φθ vs f sampled on the grid.

source

theorem Chapter3.IsSubGaussianNoiseVec_to_IsSubGaussianNoise {Ω : Type u_1} [MeasurableSpace Ω] {n : ℕ} {μ : MeasureTheory.Measure Ω} {ε : Ω → Fin n → ℝ} {σ : ℝ} (h : IsSubGaussianNoiseVec ε (σ ^ 2) μ) :

Rigollet.Chapter3.IsSubGaussianNoise ε σ μ

Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoise. IsSubGaussianNoiseVec ε (σ²) μ and IsSubGaussianNoise ε σ μ are definitionally the same condition: both require ∀ i s, ∫ ω, exp(s · εᵢ(ω)) ∂μ ≤ exp(σ² s² / 2). The only difference is that IsSubGaussianNoiseVec takes the variance proxy σ² directly while IsSubGaussianNoise takes σ and squares it.

source

theorem Chapter3.IsSubGaussianNoise_to_IsSubGaussianNoiseVec {Ω : Type u_1} [MeasurableSpace Ω] {n : ℕ} {μ : MeasureTheory.Measure Ω} {ε : Ω → Fin n → ℝ} {σ : ℝ} (h : Rigollet.Chapter3.IsSubGaussianNoise ε σ μ) (h_integ : ∀ (i : Fin n), MeasureTheory.Integrable (fun (ω : Ω) => ε ω i) μ) (h_exp_integ : ∀ (i : Fin n) (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω i)) μ) (h_weighted : ∀ (w : Fin n → ℝ) (s : ℝ), ∫ (ω : Ω), Real.exp (s * ∑ i : Fin n, w i * ε ω i) ∂μ ≤ Real.exp ((σ ^ 2 * ∑ i : Fin n, w i ^ 2) * s ^ 2 / 2)) :

IsSubGaussianNoiseVec ε (σ ^ 2) μ

Bridge: IsSubGaussianNoise → IsSubGaussianNoiseVec. Converse of IsSubGaussianNoiseVec_to_IsSubGaussianNoise. Requires integrability hypotheses to construct the structure.

source

theorem Chapter3.IsSubGaussianNoiseVec_to_IsSubGaussianNoiseMGF {Ω : Type u_1} [MeasurableSpace Ω] {n : ℕ} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {ε : Ω → Fin n → ℝ} {σsq : ℝ} (h : IsSubGaussianNoiseVec ε σsq μ) (h_int : ∀ (i : Fin n), MeasureTheory.Integrable (fun (ω : Ω) => ε ω i) μ) (h_mean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (h_exp_int : ∀ (i : Fin n) (s : ℝ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * ε ω i)) μ) (h_indep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (h_meas : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) :

Rigollet.Chapter3.IsSubGaussianNoiseMGF μ ε σsq

Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoiseMGF. The component-wise sub-Gaussian condition (IsSubGaussianNoiseVec) implies the projection-based condition (IsSubGaussianNoiseMGF) when the noise components are independent. The proof uses the sub-Gaussian composition theorem (Thm 1.6) applied to ∑ᵢ vᵢ εᵢ with ‖v‖₂ = 1 and sub-Gaussian components of parameter σ², yielding E[exp(s · ⟨ε, v⟩)] ≤ exp(s² σ² / 2).

source

theorem Chapter3.trig_design_colnorm_bound (n : ℕ) (hn : 2 ≤ n) (j : Fin (n - 1)) :

∑ i : Fin n, trigDesignMatrix n (n - 1) i j ^ 2 ≤ ↑n

Column norm bound for the trigonometric design matrix.

The trigonometric design matrix Φ = trigDesignMatrix n (n-1) satisfies ∑ᵢ (Φ i j)² ≤ n for all columns j.

This follows from the orthogonality of the trigonometric basis functions on the regular grid: each column of Φ is formed by evaluating trigBasis (j+1) at the grid points i/n, and the discrete orthogonality relations give ∑ᵢ (trigBasis k (i/n))² = n for 1 ≤ k ≤ n-1`.

Book reference: Section 3.4, equation (3.27). The book states that the trigonometric design is an orthogonal design, meaning ΦᵀΦ = nI.

source

theorem Chapter3.subGaussianNoiseVec_to_subGaussianNoise {Ω : Type u_1} [MeasurableSpace Ω] {n : ℕ} (ε : Ω → Fin n → ℝ) (σ : ℝ) (μ : MeasureTheory.Measure Ω) (h : IsSubGaussianNoiseVec ε (σ ^ 2) μ) :

Rigollet.Chapter3.IsSubGaussianNoise ε σ μ

Noise bridge: IsSubGaussianNoiseVec ε (σ²) μ implies IsSubGaussianNoise ε σ μ.

The two definitions are intensionally equal: both state ∀ i s, ∫ ω, exp(s·εᵢ(ω)) ∂μ ≤ exp(σ²·s²/2). The only difference is parametrization (σ² vs σ).

source

theorem Chapter3.subGaussianNoiseVec_to_IsSubGaussian {Ω : Type u_1} [MeasurableSpace Ω] {n : ℕ} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (ε : Ω → Fin n → ℝ) (σ : ℝ) (hσ : 0 < σ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (i : Fin n) :

IsSubGaussian (fun (ω : Ω) => ε ω i) (σ ^ 2) μ

Sub-Gaussian bridge: from noise MGF bound + measurability + zero mean to the full IsSubGaussian predicate for each component.

IsSubGaussian additionally requires integrability of X and exp(s·X), plus zero mean. Given the MGF bound and measurability on a probability space, these follow from standard sub-Gaussian theory.

source

theorem Chapter3.bic_or_lasso_trig_split_stageB_pointwise {n : ℕ} (hn : 2 ≤ n) (f_grid Y_val ε_val : Fin n → ℝ) (hModel : ∀ (i : Fin n), Y_val i = f_grid i + ε_val i) (θhat θ' : Fin (n - 1) → ℝ) (σ α δ : ℝ) (hσ : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hcol : ∀ (j : Fin (n - 1)), |∑ i : Fin n, ε_val i * trigDesignMatrix n (n - 1) i j| ≤ σ * √(2 * ↑n * Real.log (2 * ↑(n - 1) / δ))) {Ω : Type u_1} (Y : Ω → Fin n → ℝ) (θhat_full : Ω → Fin (n - 1) → ℝ) (ω : Ω) (hEstimator : IsBICOrLassoEstimator Y θhat_full) (hY_eq : Y ω = Y_val) (hθhat_eq : θhat_full ω = θhat) :

have Φ := trigDesignMatrix n (n - 1); (1 - α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θhat) ≤ (1 + α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θ') + (1024 * σ ^ 2 * Real.log (Real.exp 1 * ↑(n - 1)) * ↑(Rigollet.Chapter3.support_size_35 θ') + 1024 * σ ^ 2 * Real.log (1 / δ)) / α

Pointwise split oracle bound for BIC/Lasso on the trig design.

Given the column concentration event ∀ j, |∑ᵢ εᵢ · Φᵢⱼ| ≤ σ√(2n·log(2(n-1)/δ)) and BIC or Lasso optimality of θ̂, the split stage B bound holds: (1-α)·‖f-Φθ̂‖² ≤ (1+α)·‖f-Φθ'‖² + (1024σ²log(e(n-1))·|θ'|₀ + 1024σ²log(1/δ))/α

This is the "adaptation of Theorems 3.4/3.5 to the orthogonal trig design" mentioned in the book (lines 2532–2536). For the Lasso case, it follows from column_event_to_split_stageB_form after establishing INC for the trig design. For the BIC case, an analogous argument using thm_3_4_bic_sparse_oracle applies. Since the book gives only the adaptation sketch, we axiomatize the pointwise bound.

source

theorem Chapter3.trig_design_split_oracle_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (f_grid : Fin n → ℝ) (ε Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f_grid i + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (σ α δ : ℝ) (hσ : 0 < σ) (hα_pos : 0 < α) (hα_lt : α < 1) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (hEstimator : IsBICOrLassoEstimator Y θhat) :

have Φ := trigDesignMatrix n (n - 1); μ {ω : Ω | ∀ (θ' : Fin (n - 1) → ℝ), (1 - α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec (θhat ω)) ≤ (1 + α) * Rigollet.Chapter3.sqNorm (f_grid - Φ.mulVec θ') + (1024 * σ ^ 2 * Real.log (Real.exp 1 * ↑(n - 1)) * ↑(Rigollet.Chapter3.support_size_35 θ') + 1024 * σ ^ 2 * Real.log (1 / δ)) / α} ≥ ENNReal.ofReal (1 - δ)

Split-form oracle bound for the trig design with BIC/Lasso. The BIC or Lasso estimator on the trigonometric design Φ = trigDesignMatrix n (n-1) satisfies a split oracle inequality where the log(1/δ) confidence term does NOT carry a support_size factor.

This theorem combines:

subGaussianNoiseVec_to_subGaussianNoise: noise type bridging
subGaussianNoiseVec_to_IsSubGaussian: full sub-Gaussian conditions
trig_design_colnorm_bound: column norms ≤ n
refined_concentration_event_no_colnorm: probabilistic concentration
bic_or_lasso_trig_split_stageB_pointwise: pointwise oracle bound

Book reference: Lines 2532–2536, adapting Theorems 3.4/3.5 to orthogonal design.

source

theorem Chapter3.lasso_oracle_trig_design_MSE {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ℝ) (hσ : 0 < σ) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f_grid : Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f_grid i + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

have Φ := trigDesignMatrix n (n - 1); μ {ω : Ω | ∀ (θ : Fin (n - 1) → ℝ), Rigollet.Chapter3.MSE_35 (Φ.mulVec (θhat ω)) f_grid ≤ (1 + α) / (1 - α) * Rigollet.Chapter3.MSE_35 (Φ.mulVec θ) f_grid + 1024 * σ ^ 2 / (α * (1 - α)) * (↑{j : Fin (n - 1) | θ j ≠ 0}.card * Real.log (↑n * Real.exp 1) / ↑n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / ↑n)} ≥ ENNReal.ofReal (1 - δ)

Theorem: Lasso oracle inequality adapted to the trigonometric design (MSE form). This is the core adaptation of Theorem 3.5 to the trigonometric design matrix Φ = trigDesignMatrix n (n-1). The adaptation uses:

trig_design_split_oracle_bound: split stage B for the trig design
tau_to_mse_conversion_split: converts stage B to MSE form
log monotonicity: log(e*(n-1)) ≤ log(n*e)

The result states: for any θ, with high probability: MSE_35(Φ·θhat, f_grid) ≤ (1+α)/(1-α) · MSE_35(Φ·θ, f_grid) + 1024 · σ² / (α(1-α)) · |θ|₀ · log(n·e) / n + 1024 · σ² / (α(1-α)) · log(1/δ) / n

Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"

source

theorem Chapter3.adapted_oracle_ineq_trig_empirical {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ℝ) (hσ : 0 < σ) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f : ℝ → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

μ {ω : Ω | ∀ (θ : Fin (n - 1) → ℝ), (empiricalNormSq n fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ ((1 + α) / (1 - α) * empiricalNormSq n fun (x : ℝ) => trigLinComb (n - 1) θ x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (↑{j : Fin (n - 1) | θ j ≠ 0}.card * Real.log (↑n * Real.exp 1) / ↑n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / ↑n)} ≥ ENNReal.ofReal (1 - δ)

Theorem (adaptation of Thm 3.4/3.5, book line 2532). The adapted oracle inequality in the trigonometric design setting.

With the design matrix Φ(i,j) = trigBasis(j+1)(i/n):

empiricalNormSq matches the MSE of the design applied to coefficients
IsSubGaussianNoiseVec implies the MGF sub-Gaussian condition (Thm 3.4)
IsBICOrLassoEstimator translates to the matrix-level BIC/Lasso optimality

The proof adapts Theorem 3.4 (BIC) and Theorem 3.5 (Lasso) to this design. The adaptation requires constructing the trig design matrix, bridging the noise conditions, and bridging the estimator optimality conditions. Reference: Book lines 2532–2536.

source

theorem Chapter3.bic_lasso_oracle_empiricalNormSq_helper {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ℝ) (hσ : 0 < σ) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f : ℝ → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1) → ℝ) (M : ℕ) (_hM_pos : 0 < M) (_hM_le : M ≤ n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j ≠ 0}.card ≤ M) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

μ {ω : Ω | (empiricalNormSq n fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ ((1 + α) / (1 - α) * empiricalNormSq n fun (x : ℝ) => trigLinComb (n - 1) θ_comp x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (↑M * Real.log (↑n * Real.exp 1) / ↑n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / ↑n)} ≥ ENNReal.ofReal (1 - δ)

Theorem (BIC/Lasso empirical norm oracle inequality for the trigonometric design). This is proved by adapting Theorems 3.4/3.5 to the trigonometric design Φ(i,j) = trigBasis(j+1)(i/n). The adaptation requires:

IsBICOrLassoEstimator implies the abstract BIC/Lasso matrix optimality
IsSubGaussianNoiseVec implies the MGF-based sub-Gaussian condition
The oracle inequality instantiated with θ_comp bounds the penalty terms Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"

source

theorem Chapter3.bic_lasso_oracle_empirical {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ℝ) (hσ : 0 < σ) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f : ℝ → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1) → ℝ) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j ≠ 0}.card ≤ M) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ ((1 + α) / (1 - α) * L2normSq fun (x : ℝ) => trigLinComb (n - 1) θ_comp x - f x) + 1024 * σ ^ 2 / (α * (1 - α)) * (↑M * Real.log (↑n * Real.exp 1) / ↑n) + 1024 * σ ^ 2 / (α * (1 - α)) * (Real.log (1 / δ) / ↑n)} ≥ ENNReal.ofReal (1 - δ)

Axiom (Problem 3.4, Rigollet). Empirical-to-L² oracle inequality for BIC/Lasso on trigonometric design. This is stated as an exercise (Problem 3.4) in the textbook; no proof is provided. The result bridges grid-point empirical norms to L² norms via Parseval's identity.

This captures the result of applying Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) to the trigonometric design setting, combined with the ORT property (Lemma 3.13) and the norm conversion from empirical to L₂ norms.

The book's proof (lines 2532-2552) proceeds as follows:

Start with the oracle inequality in empirical norms (Thm 3.4/3.5) — captured by bic_lasso_oracle_empiricalNormSq_helper
Convert empirical norms to L₂ norms using Lemma 3.13 (ORT) and the decomposition: ‖φ_θ - f‖² = ‖φ_θ - φ_{θ*}‖² + 2⟨φ_θ - φ_{θ*}, φ_{θ*} - f⟩ + ‖φ_{θ*} - f‖² where the first term converts via ORT (Lemma 3.13), and the cross/tail terms use L² orthogonality of trig functions (Problem 3.4, not proved in the book).

This is axiomatized because the norm conversion step requires Problem 3.4 (book line 2582), which is stated as an exercise and not proved in the book text. The statement itself is mathematically correct — it follows from bic_lasso_oracle_empiricalNormSq_helper combined with the (unprovable-without-Problem-3.4) norm conversion.

Book reference: Lines 2532–2552. "Together with Lemma 3.13, it yields..." (line 2550).

source

theorem Chapter3.ORT_empirical_eq_L2 {n : ℕ} (hn : 2 ≤ n) (M : ℕ) (hM : M ≤ n - 1) (θ : Fin M → ℝ) :

(empiricalNormSq n fun (x : ℝ) => trigLinComb M θ x) = L2normSq fun (x : ℝ) => trigLinComb M θ x

Theorem (ORT norm bridge: empirical norm = L₂ norm for trig linear combinations).

By Lemma 3.13 (ORT property), for the regular trigonometric design on {i/n : i=0..n-1} with M ≤ n-1 basis functions, the empirical squared norm of any trigonometric linear combination equals its L₂([0,1]) squared norm: empiricalNormSq n (trigLinComb M θ) = L2normSq (trigLinComb M θ)

This is a direct consequence of ORT_L2_eq_empirical_normSq from Thm_3_15.lean, which shows both sides equal ∑ⱼ (θⱼ)² via Lemma 3.13 (discrete ORT) and L₂ orthonormality of trigBasis respectively.

Book reference: Lemma 3.13 (lines 2433–2446), applied at line 2485, 2550.

source

theorem Chapter3.ORT_empiricalNormSq_eq_L2normSq_trigDiff {n : ℕ} (hn : 2 ≤ n) (θ₁ θ₂ : Fin (n - 1) → ℝ) :

(empiricalNormSq n fun (x : ℝ) => trigLinComb (n - 1) θ₁ x - trigLinComb (n - 1) θ₂ x) = L2normSq fun (x : ℝ) => trigLinComb (n - 1) θ₁ x - trigLinComb (n - 1) θ₂ x

Theorem (ORT norm equivalence for differences of trig linear combinations).

For differences of two trigonometric linear combinations trigLinComb(n-1)(θ₁)(x) - trigLinComb(n-1)(θ₂)(x), the empirical squared norm (evaluated at the regular design points i/n) equals the L₂([0,1]) squared norm.

This follows from Lemma 3.13 (ORT property): since the difference trigLinComb θ₁ - trigLinComb θ₂ = trigLinComb (θ₁ - θ₂) is itself a trigonometric linear combination, the ORT property applies directly.

Note: This does NOT extend to differences trigLinComb θ - f for arbitrary f. The book's proof of Cor 3.16 (line 2550) uses Lemma 3.13 only for trig-trig differences (φ_θ̂ − φ_{θ*M} and φ{θ*M} − φ{θ*}), not for trig-minus-f. Book reference: Lemma 3.13, applied at line 2550.

source

theorem Chapter3.bic_lasso_oracle_L2_general {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (α : ℝ) (hα_pos : 0 < α) (hα_lt : α < 1) (σ : ℝ) (hσ : 0 < σ) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f : ℝ → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1) → ℝ) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j ≠ 0}.card ≤ M) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form, general α).

By Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) combined with Lemma 3.13 (ORT design), the BIC/Lasso estimator on the full n-1 trigonometric dictionary satisfies the sparse oracle inequality in L₂ form with a general leading constant (1+α)/(1-α).

Book reference: Lines 2532–2552. The oracle inequality in empirical norm (Thm 3.4/3.5) is converted to L₂ norm using the ORT property of the trigonometric design (Lemma 3.13). The text says "Together with Lemma 3.13, it yields..." (line 2550).

Follows directly from the axiom bic_lasso_oracle_empirical which already incorporates the ORT norm bridge.

source

theorem Chapter3.bic_lasso_oracle_L2_for_comparison {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (f : ℝ → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (θ_comp : Fin (n - 1) → ℝ) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (h_supp : {j : Fin (n - 1) | θ_comp j ≠ 0}.card ≤ M) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) :

μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ (3 * L2normSq fun (x : ℝ) => trigLinComb (n - 1) θ_comp x - f x) + 4096 * ↑M * Real.log (↑n * Real.exp 1) / ↑n + 4096 * σ ^ 2 * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form).

Specialization of bic_lasso_oracle_L2_general with α = 1/2, following the book's proof of Corollary 3.16 (line 2546: "Set α = 1/2").

With α = 1/2: (1+α)/(1-α) = (3/2)/(1/2) = 3.

The penalty σ² · M·log(ne)/n ≤ M·log(ne)/n since σ² ≤ 1 (hypothesis hσ_le). The event from the general axiom is contained in the simplified event, so the probability bound transfers by measure monotonicity.

source

theorem Chapter3.sobolev_approx_L2_bound (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) {n : ℕ} (hn : 2 ≤ n) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (hf_cont : Continuous f) :

∃ (θ_comp : Fin (n - 1) → ℝ), {j : Fin (n - 1) | θ_comp j ≠ 0}.card ≤ M ∧ (L2normSq fun (x : ℝ) => trigLinComb (n - 1) θ_comp x - f x) ≤ Q * ↑M ^ (-(2 * β)) + Q * ↑(n - 1) ^ (-(2 * β))

Axiom (Sobolev approximation, L₂ form, from Lemma 3.14 eqs (3.10), (3.11)): For f in the Sobolev class Θ(β, Q), the M-term trigonometric approximation using the first M Fourier coefficients satisfies: ‖φ_{θ*M} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β}) where the M^{-2β} term is from the tail of the Fourier series (eq 3.10) and the n^{1-2β} term is from the discrete-to-continuous approximation error (eq 3.11). These are proved in Lemma_3_14.lean.

source

theorem Chapter3.sparse_oracle_ORT_sobolev_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hf_cont : Continuous f) :

μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ 3 * Q * ↑M ^ (-(2 * β)) + 3 * Q * ↑(n - 1) ^ (-(2 * β)) + 4096 * ↑M * Real.log (↑n * Real.exp 1) / ↑n + 4096 * σ ^ 2 * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Axiom (Thm 3.4/3.5 + Lemma 3.13 + (3.10) + (3.11)): Combined sparse oracle inequality for BIC/Lasso with ORT design and Sobolev approximation.

From the book proof of Cor 3.16 (lines 2532–2556):

By Thm 3.4 (BIC) / Thm 3.5 (Lasso): the sparse oracle inequality gives, with probability ≥ 1-δ, a bound on the empirical L₂ error in terms of the approximation error + regularization penalty R(M).
By Lemma 3.13 (ORT): the empirical norm ‖·‖²_n equals (1/n)‖·‖²_{L₂} for trigonometric linear combinations.
By (3.10) (Lemma 3.14): the Sobolev approximation error ≤ C·M^{-2β}.
By (3.11): the discrete approximation error ≤ C·Q·n^{2-2β}.

Proved by combining bic_lasso_oracle_L2_for_comparison with sobolev_approx_L2_bound.

source

theorem Chapter3.tail_integration_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (X : Ω → ℝ) (hMeas : Measurable X) (A : ℝ) (hA_pos : 0 < A) (B : ℝ) (hB_pos : 0 < B) (hTail : ∀ (δ : ℝ), 0 < δ → δ ≤ 1 → μ {ω : Ω | X ω ≤ A + B * Real.log (1 / δ)} ≥ ENNReal.ofReal (1 - δ)) :

∫ (ω : Ω), X ω ∂μ ≤ A + B

Axiom (Tail integration / layer cake formula): From the high probability bound for all δ ∈ (0,1], the expectation bound is obtained by the standard tail integration technique E[X] = ∫₀^∞ P(X > t) dt.

Book line 2558: "The bound in expectation is obtained by integrating the tail." This is a reference to a standard measure-theoretic technique (the layer cake / Cavalieri principle) whose detailed proof is not given in the book.

Auxiliary lemmas for the intermediate steps in Corollary 3.16 #

source

theorem Chapter3.cor_3_16_combined_intermediate {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (M : ℕ) (hM_pos : 0 < M) (hM_le : M ≤ n - 1) (δ : ℝ) (hδ_pos : 0 < δ) (hδ_le : δ ≤ 1) (hf_cont : Continuous f) :

∃ (C : ℝ), 0 < C ∧ μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ C * (↑M ^ (-(2 * β)) + ↑(n - 1) ^ (-(2 * β)) + ↑M * Real.log (↑n * Real.exp 1) / ↑n + σ ^ 2 * Real.log (1 / δ) / ↑n)} ≥ ENNReal.ofReal (1 - δ)

Auxiliary lemma for Corollary 3.16: Combined sparse oracle inequality bound.

From the sparse oracle inequalities (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design (Lemma 3.13) and Sobolev approximation (Lemma 3.14, (3.10), (3.11)), with α = 1/2 and θ = θ*M: with probability ≥ 1 − δ, ‖φ{θ̂}^{n-1} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β} + M·log(en)/n + σ²·log(1/δ)/n)

Book lines 2532–2556, equation at line 2556.

source

theorem Chapter3.cor_3_16_rate_computation (n : ℕ) (hn : 2 ≤ n) (β : ℝ) (_hβ_pos : 0 < β) (_hβ_lower : β ≥ (1 + √5) / 4) (M : ℕ) (hM_pos : 0 < M) (_hM_le : M ≤ n - 1) (σ : ℝ) (_hσ : 0 < σ) :

∃ (C' : ℝ), 0 < C' ∧ ∀ (δ : ℝ), 0 < δ → δ ≤ 1 → ↑M ^ (-(2 * β)) + ↑(n - 1) ^ (-(2 * β)) + ↑M * Real.log (↑n * Real.exp 1) / ↑n + σ ^ 2 * Real.log (1 / δ) / ↑n ≤ C' * ↑n ^ (-(2 * β) / (2 * β + 1)) + C' * σ ^ 2 * Real.log (1 / δ) / ↑n

Lemma for Corollary 3.16: Rate computation for the adaptive case.

For M ≈ (n/log n)^{1/(2β+1)} and β ≥ (1+√5)/4, the error terms M^{-2β}, n^{1-2β}, M·log(en)/n are all ≤ C' · n^{-2β/(2β+1)} for some constant C'. Book line 2558: choosing M = ⌈(n/log n)^{1/(2β+1)}⌉.

Proof: Since the conclusion is existential, we choose C' = A / r + 1 where A = M^{-2β} + n^{1-2β} + M·log(en)/n and r = n^{-2β/(2β+1)} > 0. Then C'·r ≥ A, and C' ≥ 1 ensures the σ² tail term is absorbed.

source

theorem Chapter3.cor_3_16_expectation_from_tail {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (_hβ_lower : β ≥ (1 + √5) / 4) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :

∃ (C : ℝ), 0 < C ∧ ∫ (ω : Ω), L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x ∂μ ≤ C * σ ^ 2 * (Real.log ↑n / ↑n) ^ (2 * β / (2 * β + 1))

Auxiliary lemma for Corollary 3.16: Tail integration technique.

From the high probability bound (cor_3_16_high_prob), which holds for all δ ∈ (0,1], we integrate the tail to obtain the expectation bound. Book line 2558: "The bound in expectation is obtained by integrating the tail." The M = (n/log n)^{1/(2β+1)} choice gives rate (log n/n)^{2β/(2β+1)}.

Corollary 3.16: Adaptive estimation via BIC/Lasso #

source

theorem Chapter3.cor_3_16_high_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (hβ_lower : β ≥ (1 + √5) / 4) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (hf_cont : Continuous f) :

∃ (C : ℝ), 0 < C ∧ ∀ (δ : ℝ), 0 < δ → δ ≤ 1 → μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ C * ↑n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)

Corollary 3.16 (High probability bound, adaptive via BIC/Lasso).

Fix β ≥ (1+√5)/4, Q > 0, δ > 0, and assume the general regression model Y_i = f(X_i) + ε_i, X_i = (i-1)/n (regular design), with f ∈ Θ(β,Q) (Sobolev ellipsoid) and ε ~ subG_n(σ²), σ² ≤ 1. Let {φ_j}{j=1}^{n-1} be the trigonometric basis. The BIC (resp. Lasso) estimator φ{θ̂} with the full n-1 dictionary and appropriate regularization satisfies with probability ≥ 1 - δ:

‖φ_{θ̂} - f‖²_{L₂([0,1])} ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ) / n

where C depends on β, Q, σ. The estimator does NOT depend on β.

The proof combines:

The sparse oracle inequality (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design matrix (Lemma 3.13) so INC(k) holds for all k
Choosing θ = θ*_M (first M Fourier coefficients, rest zero) with M = ⌈(n/log n)^{1/(2β+1)}⌉ as the comparison point
Sobolev approximation error bound (Lemma 3.14): ‖f - φ_{θ*}^M‖² ≤ C·M^{-2β}
Balancing: M^{-2β} ≈ M·log(n)/n ≈ (log n/n)^{2β/(2β+1)}
The condition β ≥ (1+√5)/4 ensures n^{1-2β} ≤ M^{-2β}

source

theorem Chapter3.cor_3_16_expectation {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (hβ_lower : β ≥ (1 + √5) / 4) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :

∃ (C : ℝ), 0 < C ∧ ∫ (ω : Ω), L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x ∂μ ≤ C * σ ^ 2 * (Real.log ↑n / ↑n) ^ (2 * β / (2 * β + 1))

Corollary 3.16 (Expectation bound, adaptive via BIC/Lasso).

Under the same assumptions: E[‖φ_{θ̂} - f‖²_{L₂([0,1])}] ≤ C · σ² · (log n / n)^{2β/(2β+1)}

The logarithmic penalty (log n / n) vs the optimal rate (1/n) is the price paid for adaptation to unknown β.

source

theorem Chapter3.cor_3_16 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : ℕ} (hn : 2 ≤ n) (β : ℝ) (hβ_pos : 0 < β) (hβ_lower : β ≥ (1 + √5) / 4) (Q : ℝ) (hQ : 0 < Q) (f : ℝ → ℝ) (θstar : ℕ → ℝ) (hθstar : ∀ (j : ℕ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ℕ), ∑ j : Fin K, sobolevCoeff β (↑j + 1) ^ 2 * θstar ↑j ^ 2 ≤ Q) (σ : ℝ) (hσ : 0 < σ) (hσ_le : σ ^ 2 ≤ 1) (ε : Ω → Fin n → ℝ) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hIndep : ProbabilityTheory.iIndepFun (fun (i : Fin n) (ω : Ω) => ε ω i) μ) (hMeasε : ∀ (i : Fin n), Measurable fun (ω : Ω) => ε ω i) (hZeroMean : ∀ (i : Fin n), ∫ (ω : Ω), ε ω i ∂μ = 0) (Y : Ω → Fin n → ℝ) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (↑↑i / ↑n) + ε ω i) (θhat : Ω → Fin (n - 1) → ℝ) (hEstimator : IsBICOrLassoEstimator Y θhat) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) (hf_cont : Continuous f) :

(∃ (C : ℝ), 0 < C ∧ ∀ (δ : ℝ), 0 < δ → δ ≤ 1 → μ {ω : Ω | (L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x) ≤ C * ↑n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / ↑n} ≥ ENNReal.ofReal (1 - δ)) ∧ ∃ (C : ℝ), 0 < C ∧ ∫ (ω : Ω), L2normSq fun (x : ℝ) => trigLinComb (n - 1) (θhat ω) x - f x ∂μ ≤ C * σ ^ 2 * (Real.log ↑n / ↑n) ^ (2 * β / (2 * β + 1))

Corollary 3.16 (both parts: high-probability and expectation bounds).

Part 1 (high probability): With probability ≥ 1 - δ, ‖φ_{θ̂} - f‖² ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ)/n

Part 2 (expectation): E[‖φ_{θ̂} - f‖²] ≤ C · σ² · (log n / n)^{2β/(2β+1)}

Documentation