Documentation

Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3_15

L₂ norm on [0,1] #

noncomputable def Chapter3.L2normSq (g : ) :

The L₂([0,1]) squared norm: ‖g‖²_{L₂} = ∫₀¹ g(x)² dx

Instances For
    noncomputable def Chapter3.empiricalNormSq (n : ) (g : ) :

    The empirical squared norm: ‖g‖²_n = (1/n) Σᵢ g(Xᵢ)² with Xᵢ = (i-1)/n

    Instances For

      Sub-Gaussian noise predicate #

      structure Chapter3.IsSubGaussianNoiseVec {Ω : Type u_1} [MeasurableSpace Ω] {n : } (ε : ΩFin n) (σsq : ) (μ : MeasureTheory.Measure Ω) :

      Sub-Gaussian noise: each component εᵢ satisfies the MGF bound E[exp(s·εᵢ)] ≤ exp(σ²s²/2), and the associated integrability conditions. This is ε ~ subG_n(σ²) from the book.

      Instances For
        @[implicit_reducible]
        instance Chapter3.instCoeFunIsSubGaussianNoiseVecForallForallLeRealIntegralExpHMulHDivHPowOfNat {Ω : Type u_1} [MeasurableSpace Ω] {n : } {ε : ΩFin n} {σsq : } {μ : MeasureTheory.Measure Ω} :
        CoeFun (IsSubGaussianNoiseVec ε σsq μ) fun (x : IsSubGaussianNoiseVec ε σsq μ) => ∀ (i : Fin n) (s : ), (ω : Ω), Real.exp (s * ε ω i) μ Real.exp (σsq * s ^ 2 / 2)

        Coercion so that hsubG i s still works as shorthand for hsubG.bound i s.

        Regression model and estimator #

        noncomputable def Chapter3.trigLinComb (M : ) (θ : Fin M) (x : ) :

        The trigonometric linear combination: φ_θ(x) = Σⱼ θⱼ φⱼ(x)

        Instances For
          def Chapter3.IsLSEstimator {Ω : Type u_1} {n M : } (Y : ΩFin n) (θhat : ΩFin M) :

          The LS estimator: minimizes (1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² over θ ∈ ℝᴹ, with Xᵢ = (i-1)/n (regular design).

          Instances For

            Key algebraic lemma: the exponent computation #

            theorem Chapter3.rpow_exponent_combine (n β : ) (hn : 0 < n) :
            (n ^ (1 / (2 * β + 1))) ^ (-(2 * β)) = n ^ (-(2 * β) / (2 * β + 1))

            Key computation: (n^{1/(2β+1)})^{-2β} = n^{-2β/(2β+1)}

            theorem Chapter3.rpow_M_div_n (n β : ) (hn : 0 < n) ( : 0 < β) :
            n ^ (1 / (2 * β + 1)) / n = n ^ (-(2 * β) / (2 * β + 1))

            Key computation: n^{1/(2β+1)} / n = n^{-2β/(2β+1)}

            Oracle inequality in L₂ form (helper for Theorem 3.15) #

            The oracle inequality for the LS estimator (Theorem 3.3) combined with the orthogonality of the trigonometric basis (Lemma 3.13, ORT) gives:

            With probability ≥ 1 − δ: ‖φ_{θ̂^LS} − φ_{θ*}^M‖²_{L₂} ≤ C₁ · (n^{1−2β} + σ²(M + log(1/δ))/n)

            This is obtained from (3.12) by:

            Book proof lines 2496–2498. The proof of this helper combines:

            ORT norm equivalence for trigonometric linear combinations #

            For the trigonometric basis on the regular grid X_i = i/n with M ≤ n-1, the L₂ norm and empirical norm coincide for any trigonometric linear combination. Specifically: ∫₀¹ (Σⱼ θⱼ φⱼ(x))² dx = (1/n) Σᵢ (Σⱼ θⱼ φⱼ(i/n))².

            This is a direct consequence of Lemma 3.13 (ORT: ΦᵀΦ = nI), which implies that the Gram matrix of the design matrix is n times the identity. For any coefficient vector θ, the empirical squared norm (1/n)|Φθ|² equals ‖θ‖²_{ℓ₂}, and by orthonormality of the trigonometric basis in L₂([0,1]), ∫₀¹ (Σⱼ θⱼ φⱼ(x))² dx also equals ‖θ‖²_{ℓ₂}.

            The proof requires bridging between Matrix/dotProduct types (from Lemma_3_13.lean) and continuous L₂/integral types (used here). This type-bridging infrastructure is axiomatized since the mathematical content follows directly from Lemma 3.13 but the formalization uses incompatible type representations.

            Book line 2485: "Lemma 3.13 implies |φ − φ|₂² = n ‖φ − φ‖²_{L₂([0,1])}".

            theorem Chapter3.trigDesignMatrix_eq_trigBasis (n M : ) (hn : 0 < n) (i : Fin n) (j : Fin M) :
            trigDesignMatrix n M i j = trigBasis (j + 1) (i / n)

            Helper: the connection between trigBasis and trigDesignMatrix entries. trigDesignMatrix n M i j = trigBasis (j.val + 1) (i / n)

            theorem Chapter3.empirical_norm_eq_sum_sq (n M : ) (hn : 0 < n) (hM : M n - 1) (θ : Fin M) :
            1 / n * i : Fin n, (∑ j : Fin M, θ j * trigBasis (j + 1) (i / n)) ^ 2 = j : Fin M, θ j ^ 2

            Helper: the empirical norm of a trig linear combination equals the ℓ² norm of θ. (1/n) ∑ᵢ (∑ⱼ θⱼ φⱼ(xᵢ))² = ∑ⱼ (θⱼ)² This follows from Lemma 3.13: ΦᵀΦ = nI.

            theorem Chapter3.trigBasis_L2_orthonormal (j j' : ) (hj : 0 < j) (hj' : 0 < j') :
            (x : ) in Set.Icc 0 1, trigBasis j x * trigBasis j' x = if j = j' then 1 else 0

            Helper: L₂ orthonormality of trigBasis. The trigonometric basis {φ₁, φ₂, ...} is orthonormal in L₂([0,1]): ∫₀¹ φⱼ(x) φⱼ'(x) dx = δⱼⱼ' for j, j' ≥ 1

            This follows from standard properties of trigonometric functions:

            • ∫₀¹ 1² dx = 1
            • ∫₀¹ 2·cos²(2πkx) dx = 1
            • ∫₀¹ 2·sin²(2πkx) dx = 1
            • ∫₀¹ cos(2πjx)·sin(2πkx) dx = 0
            • ∫₀¹ cos(2πjx)·cos(2πkx) dx = 0 for j ≠ k
            • ∫₀¹ sin(2πjx)·sin(2πkx) dx = 0 for j ≠ k

            The book references this as a standard fact (Example 3.8). Proving all these integration results from scratch is extremely tedious. We axiomatize L₂ orthonormality of trigBasis since the book does not prove it explicitly — it is cited as a well-known result from Fourier analysis.

            theorem Chapter3.L2_norm_eq_sum_sq (M : ) (θ : Fin M) :
            (L2normSq fun (x : ) => j : Fin M, θ j * trigBasis (j + 1) x) = j : Fin M, θ j ^ 2

            The L₂ norm of a trig linear combination equals the ℓ² norm of θ. ∫₀¹ (∑ⱼ θⱼ φⱼ₊₁(x))² dx = ∑ⱼ (θⱼ)² This follows from L₂ orthonormality of trigBasis.

            theorem Chapter3.ORT_L2_eq_empirical_normSq (n M : ) (hn : 0 < n) (hM : M n - 1) (θ : Fin M) :
            (L2normSq fun (x : ) => j : Fin M, θ j * trigBasis (j + 1) x) = 1 / n * i : Fin n, (∑ j : Fin M, θ j * trigBasis (j + 1) (i / n)) ^ 2
            theorem Chapter3.trigLinComb_eq_mulVec (n M : ) (hn : 0 < n) (θ : Fin M) (i : Fin n) :
            trigLinComb M θ (i / n) = (trigDesignMatrix n M).mulVec θ i

            Bridge: trigLinComb at grid point equals matrix-vector product. trigLinComb M θ (i/n) = (trigDesignMatrix n M).mulVec θ i

            theorem Chapter3.finset_range_eq_fin_sum (M : ) (θstar : ) (x : ) :
            jFinset.range M, θstar j * trigBasis (j + 1) x = j : Fin M, θstar j * trigBasis (j + 1) x

            Bridge: Finset.range M sum with θstar equals Fin M sum. Converts ∑ j ∈ Finset.range M, θstar j * trigBasis (j+1) x to ∑ j : Fin M, θstar j.val * trigBasis (j.val+1) x.

            theorem Chapter3.empirical_diff_eq_coeff_sq (n M : ) (hn : 0 < n) (hM : M n - 1) (θ₁ θ₂ : Fin M) :
            1 / n * i : Fin n, (j : Fin M, θ₁ j * trigBasis (j + 1) (i / n) - j : Fin M, θ₂ j * trigBasis (j + 1) (i / n)) ^ 2 = j : Fin M, (θ₁ j - θ₂ j) ^ 2

            The empirical difference norm as a sum of coefficient differences. Shows that the empirical norm of the difference between two trig linear combinations equals the ℓ² norm of their coefficient differences. Key bridge: by empirical_norm_eq_sum_sq (from Lemma 3.13).

            theorem Chapter3.trigBasis_col_sq_sum (n M : ) (hn : 0 < n) (hM : M n - 1) (j : Fin M) :
            i : Fin n, trigBasis (j + 1) (i / n) ^ 2 = n

            Sum of squares of trigBasis column values equals n. From Lemma 3.13 (ΦᵀΦ = nI), the (j,j) diagonal entry gives ∑ᵢ trigBasis(j+1)(i/n)² = n.

            theorem Chapter3.ls_noise_coord_subG_tail {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (M : ) (_hM_pos : 0 < M) (hM_le : M n - 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (_hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (hmgf : ∀ (w : Fin n) (s : ), (ω : Ω), Real.exp (s * i : Fin n, ε ω i * w i) μ Real.exp ((σ ^ 2 * i : Fin n, w i ^ 2) * s ^ 2 / 2)) (hintegrable : ∀ (w : Fin n) (s : ), MeasureTheory.Integrable (fun (ω : Ω) => Real.exp (s * i : Fin n, ε ω i * w i)) μ) (j : Fin M) (t : ) (ht : 0 < t) :
            μ {ω : Ω | |i : Fin n, ε ω i * trigBasis (j + 1) (i / n)| / n t} ENNReal.ofReal (2 * Real.exp (-n * t ^ 2 / (2 * σ ^ 2)))

            Per-coordinate sub-Gaussian tail: for each j, the noise contribution (1/n) Σᵢ εᵢ φⱼ(xᵢ) satisfies a sub-Gaussian tail bound.

            theorem Chapter3.subG_chi2_complement_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (M : ) (hM_pos : 0 < M) (Z : Fin MΩ) (v : ) (hv : 0 < v) (hZ_subG : ∀ (j : Fin M) (s : ), (ω : Ω), Real.exp (s * Z j ω) μ Real.exp (v * s ^ 2 / 2)) (δ : ) ( : 0 < δ) (hδ1 : δ 1) :
            μ {ω : Ω | 4 * M * v * (M + Real.log (1 / δ)) < j : Fin M, Z j ω ^ 2} ENNReal.ofReal δ

            Sub-Gaussian complement bound (helper from Theorem 1.19 proof).

            For M sub-Gaussian random variables with variance proxy v and threshold T = 4Mv(M + log(1/δ)), the complement {\u03c9 | ∑ Zⱼ² > T} has measure ≤ δ. This is the core probabilistic estimate from the proof of Theorem 1.19 (book lines 902–922), using per-coordinate Chernoff bounds, pigeonhole, and a union bound. The book proves this in Chapter 2 via the sub-Gaussian max inequality following equation (2.5).

            theorem Chapter3.subG_chi2_concentration_ch2 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] (M : ) (hM_pos : 0 < M) (Z : Fin MΩ) (v : ) (hv : 0 < v) (hZ_subG : ∀ (j : Fin M) (s : ), (ω : Ω), Real.exp (s * Z j ω) μ Real.exp (v * s ^ 2 / 2)) :
            ∃ (C_chi2 : ), 0 < C_chi2 ∀ (δ : ), 0 < δδ 1μ {ω : Ω | j : Fin M, Z j ω ^ 2 C_chi2 * v * (M + Real.log (1 / δ))} ENNReal.ofReal (1 - δ)

            Sub-Gaussian chi-squared concentration (Theorem 1.19 / equation 2.5).

            For M sub-Gaussian random variables Z_j with variance proxy v, P(∑ⱼ Zⱼ² ≤ C · v · (M + log(1/δ))) ≥ 1 - δ

            Proved from the complement bound subG_chi2_complement_bound (which encodes the per-coordinate Chernoff + pigeonhole + union bound argument from Theorem 1.19, book lines 902–922) combined with standard probability complement arithmetic.

            Book reference: equation (2.5) and Theorem 1.19, cited in the proof of Theorem 3.3.

            theorem Chapter3.ls_coeff_subG_mgf_ch2 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (σ : ) ( : 0 < σ) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (f : ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (θbar : Fin M) (hθbar : ∀ (j : Fin M), θbar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) :
            ∃ (C₀ : ), 0 < C₀ ∀ (j : Fin M) (s : ), (ω : Ω), Real.exp (s * (θhat ω j - θbar j)) μ Real.exp (C₀ * (σ ^ 2 / n) * s ^ 2 / 2)

            LS coefficient decomposition (from Chapter 2 / Lemma 3.13).

            On the orthogonal trigonometric design (Lemma 3.13, ΦᵀΦ = nI), the LS estimator θ̂ decomposes as: θ̂ⱼ - θ̄ⱼ = (1/n) ∑ᵢ εᵢ φⱼ(xᵢ) when θ̄ⱼ = (1/n) ∑ᵢ f(xᵢ) φⱼ(xᵢ) (the empirical Fourier coefficient).

            The sub-Gaussian MGF bound for (1/n) ∑ᵢ εᵢ φⱼ(xᵢ) with variance proxy σ²/n follows from the componentwise sub-Gaussian property and the orthogonality relation ∑ᵢ φⱼ(xᵢ)² = n.

            The proof requires deriving the closed-form LS solution on orthogonal designs (first-order optimality → normal equations → θ̂ⱼ = (1/n) ∑ᵢ Yᵢ φⱼ(xᵢ)), then applying the weighted_sum_bound field of IsSubGaussianNoiseVec. The book states this as following from "the same steps as those following equation (2.5)" — a cross-chapter reference to Chapter 2 techniques.

            theorem Chapter3.subG_trig_concentration {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (f : ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (θbar : Fin M) (hθbar : ∀ (j : Fin M), θbar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) :
            ∃ (C_conc : ), 0 < C_conc ∀ (δ : ), 0 < δδ 1μ {ω : Ω | j : Fin M, (θhat ω j - θbar j) ^ 2 C_conc * σ ^ 2 * (M + Real.log (1 / δ)) / n} ENNReal.ofReal (1 - δ)

            Sub-Gaussian concentration for trig LS estimator.

            Under sub-Gaussian noise and the orthogonal trigonometric design (Lemma 3.13), the LS estimator θ̂ satisfies: with probability ≥ 1-δ, ∑ⱼ (θ̂ⱼ - θ*ⱼ)² ≤ C · σ² · (M + log(1/δ)) / n

            This follows from the sub-Gaussian max inequality ("the same steps as those following equation (2.5)" in the book — a cross-chapter reference to Chapter 2). The orthogonal design ΦᵀΦ = nI (Lemma 3.13) simplifies the bound. The proof combines two axiomatized helpers from Chapter 2:

            Note: This is stated in ℓ² coefficient space rather than in matrix form, since by empirical_norm_eq_sum_sq, ∑ⱼ (θ̂ⱼ - θ*ⱼ)² equals the empirical squared norm (1/n) ∑ᵢ (φ_θ̂(xᵢ) - φ_θ*(xᵢ))².

            Book reference: Theorem 3.3 proof, citing equation (2.5) from Chapter 2.

            theorem Chapter3.oracle_ineq_empirical_form {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) :
            ∃ (C₁ : ), 0 < C₁ ∀ (δ : ), 0 < δδ 1μ {ω : Ω | 1 / n * i : Fin n, (trigLinComb M (θhat ω) (i / n) - jFinset.range M, θstar j * trigBasis (j + 1) (i / n)) ^ 2 C₁ * (n ^ (1 - 2 * β) + σ ^ 2 * (M + Real.log (1 / δ)) / n)} ENNReal.ofReal (1 - δ)

            Oracle inequality in empirical norm form (equation 3.12).

            Proved from subG_trig_concentration and sobolev_bias_bound. For the LS estimator on the trigonometric basis with sub-Gaussian noise and Sobolev class regression function, the empirical estimation error satisfies with probability ≥ 1-δ: (1/n) Σᵢ (φ_{θ̂}(xᵢ) - φ_{θ*}(xᵢ))² ≤ C₁ · (n^{1-2β} + σ²(M + log(1/δ))/n)

            Book reference: equation (3.12), lines 2478-2483.

            theorem Chapter3.oracle_inequality_L2_form {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) :
            ∃ (C₁ : ), 0 < C₁ ∀ (δ : ), 0 < δδ 1μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb M (θhat ω) x - jFinset.range M, θstar j * trigBasis (j + 1) x) C₁ * (n ^ (1 - 2 * β) + σ ^ 2 * (M + Real.log (1 / δ)) / n)} ENNReal.ofReal (1 - δ)
            theorem Chapter3.fourier_L2_completeness_aux (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (β Q : ) ( : 0 < β) (hQ : 0 < Q) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (hSqSum : Summable fun (j : ) => θstar j ^ 2) :
            ∀ᵐ (x : ) MeasureTheory.volume.restrict (Set.Icc 0 1), f x = ∑' (j : ), θstar j * trigBasis (j + 1) x

            Fourier L² completeness helper (axiom).

            This axiom encapsulates the deep result from harmonic analysis that a function whose Fourier coefficients are square-summable equals its Fourier series a.e. on [0,1]. The book does not prove this; it is stated as a foundational fact at lines 2293–2299 (the trigonometric system forms a complete orthonormal basis of L²([0,1])).

            theorem Chapter3.fourier_L2_completeness (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (β Q : ) ( : 0 < β) (hQ : 0 < Q) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) :
            ∀ᵐ (x : ) MeasureTheory.volume.restrict (Set.Icc 0 1), f x = ∑' (j : ), θstar j * trigBasis (j + 1) x

            Fourier L² completeness.

            For a function f with Fourier coefficients θstar satisfying the Sobolev regularity condition, the function f equals its Fourier series representation ∑' j, θstar j · φ_{j+1} a.e. on [0,1].

            This is a deep result from Fourier analysis (L² convergence of Fourier series for L² functions). The book does not prove this; it is a foundational fact from harmonic analysis that the trigonometric system {φ_j}_{j≥1} forms an orthonormal basis of L²([0,1]), and hence the Fourier series converges in L² to the function, which implies a.e. equality.

            Note: The pointwise a.e. equality is slightly stronger than the L² convergence statement ‖f − ∑' θ_j φ_j‖_{L²} = 0, but follows from it for L² functions by standard measure theory (∫ g² = 0 and g integrable ⟹ g = 0 a.e.). We state the a.e. form directly since it is more convenient for the proof of approx_error_L2_form.

            theorem Chapter3.fourier_tail_L2_bound (M : ) (hM : 0 < M) (β Q : ) ( : 1 / 2 < β) (hQ : 0 < Q) (θstar : ) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) :
            (L2normSq fun (x : ) => ∑' (j : ), θstar j * trigBasis (j + 1) x - jFinset.range M, θstar j * trigBasis (j + 1) x) Q * M ^ (-(2 * β))

            Fourier tail L² bound under Sobolev regularity (axiom).

            For Fourier coefficients θstar satisfying the Sobolev regularity condition ∀ K, ∑{j<K} (sobolevCoeff β (j+1))² · θstar(j)² ≤ Q, the L² norm of the Fourier series tail from index M onwards is bounded: ∫₀¹ (∑' j, θstar(j) · φ{j+1}(x) − ∑{j<M} θstar(j) · φ{j+1}(x))² dx ≤ Q · M^{−2β}

            This combines two sub-results:

            1. Parseval's identity: the L² norm of the tail equals ∑_{j≥M} θstar(j)²
            2. The Cauchy–Schwarz/Sobolev bound: ∑_{j≥M} θstar(j)² ≤ Q · M^{−2β}

            The proof proceeds by:

            1. Establishing θstar ∈ SobolevEllipsoidInf β Q from bounded partial sums
            2. Proving summability of θstar(j)² (from Sobolev weights ≥ 1 for j ≥ 1)
            3. Applying Parseval's identity (parseval_truncation_L2_error)
            4. Bounding the tail sum using the Sobolev condition and weight monotonicity
            theorem Chapter3.approx_error_L2_form (M : ) (hM : 0 < M) (β Q : ) ( : 1 / 2 < β) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) :
            (L2normSq fun (x : ) => jFinset.range M, θstar j * trigBasis (j + 1) x - f x) Q * M ^ (-(2 * β))

            Approximation error bound (Lemma 3.14(i)).

            For f in the Sobolev class Θ(β, Q) (with Fourier coefficients θstar), the M-term truncation satisfies: ‖f − φ_{θ*}^M‖²_{L₂} ≤ Q · M^{−2β}

            This is the deterministic approximation error from Lemma 3.14, equation (3.10). The proof proceeds in two steps:

            1. By Fourier L² completeness (fourier_L2_completeness), f equals its Fourier series in L²([0,1]), so ‖partial_sum − f‖² = ‖partial_sum − full_series‖² in L².
            2. By the Fourier tail bound (fourier_tail_L2_bound), the tail of the Fourier series satisfies ‖full_series − partial_sum‖² ≤ Q · M^{−2β}.

            More precisely, using the L² triangle inequality and the fact that L2normSq(partial_sum − f) ≤ 2 · L2normSq(partial_sum − series) + 2 · L2normSq(series − f), with the second term being 0 by Fourier completeness.

            theorem Chapter3.approx_error_integrableOn (M : ) (_hM : 0 < M) (β Q : ) (_hβ : 0 < β) (_hQ : 0 < Q) (f : ) (θstar : ) (_hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (_hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            MeasureTheory.IntegrableOn (fun (x : ) => (jFinset.range M, θstar j * trigBasis (j + 1) x - f x) ^ 2) (Set.Icc 0 1) MeasureTheory.volume

            Square-integrability of the truncation error on [0,1].

            For f in the Sobolev class Θ(β, Q), the truncation error h(x) = (Σ θ*ⱼ φⱼ)(x) - f(x) is square-integrable on [0,1]. This follows from the Parseval identity: the Sobolev class membership implies f ∈ L²([0,1]), and the finite trig sum is continuous hence square-integrable. Same type-bridging gap as oracle_inequality_L2_form.

            theorem Chapter3.L2normSq_add_le (g h : ) (hg : MeasureTheory.IntegrableOn (fun (x : ) => g x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) (hh : MeasureTheory.IntegrableOn (fun (x : ) => h x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            (L2normSq fun (x : ) => g x + h x) 2 * (L2normSq g + L2normSq h)

            Triangle inequality for L₂ squared norm. ∫₀¹ (g(x) + h(x))² dx ≤ 2 · (∫₀¹ g(x)² dx + ∫₀¹ h(x)² dx) Follows from the pointwise (a + b)² ≤ 2(a² + b²) and monotonicity of integration. Requires square-integrability of g and h on [0,1].

            Theorem 3.15: High probability bound #

            theorem Chapter3.thm_3_15_combined_intermediate {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 1 / 2 < β) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            ∃ (C : ), 0 < C ∀ (δ : ), 0 < δδ 1μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) C * (M ^ (-(2 * β)) + n ^ (1 - 2 * β) + (M + σ ^ 2 * Real.log (1 / δ)) / n)} ENNReal.ofReal (1 - δ)

            Auxiliary lemma for Theorem 3.15: Combined oracle inequality bound.

            From (3.12) (Theorem 3.3 oracle inequality) and Lemma 3.13 (ORT for trigonometric basis), with α = 1/2: with probability ≥ 1 − δ, ‖φ_{θ̂^LS} − f‖²_{L₂} ≤ C₁ · (M^{−2β} + n^{1−2β} + (M + σ²·log(1/δ))/n)

            This bundles the oracle inequality (Thm 3.3), orthogonality (Lemma 3.13), and approximation error (Lemma 3.14) into a single probabilistic bound on the full L₂ error. Book lines 2496–2502.

            theorem Chapter3.thm_3_15_rate_computation (n : ) (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (M : ) (_hM_pos : 0 < M) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (σ : ) (_hσ : 0 < σ) :
            ∃ (C' : ), 0 < C' ∀ (δ : ), 0 < δδ 1M ^ (-(2 * β)) + n ^ (1 - 2 * β) + (M + σ ^ 2 * Real.log (1 / δ)) / n C' * n ^ (-(2 * β) / (2 * β + 1)) + C' * σ ^ 2 * Real.log (1 / δ) / n

            Lemma for Theorem 3.15: Rate computation.

            For M ≈ n^{1/(2β+1)} and β ≥ (1+√5)/4, the three error terms M^{-2β}, n^{1-2β}, M/n are all ≤ C' · n^{-2β/(2β+1)} for some constant C'. Book lines 2504–2508: plugging in M = ⌈n^{1/(2β+1)}⌉ and using n^{1-2β} ≤ n^{-2β/(2β+1)} for the prescribed β.

            theorem Chapter3.thm_3_15_high_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            ∃ (C : ), 0 < C μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) C * n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

            Theorem 3.15 (High probability bound for LS on Sobolev class).

            Fix β ≥ (1+√5)/4, Q > 0, δ > 0 and assume the general regression model Y_i = f(X_i) + ε_i, X_i = (i-1)/n (regular design), with f ∈ Θ(β,Q) (Sobolev ellipsoid) and ε ~ subG_n(σ²), σ² ≤ 1. Let M = ⌈n^{1/(2β+1)}⌉ and the LS estimator θ̂^LS minimize the sum of squares over the first M trigonometric basis functions.

            Then with probability ≥ 1 - δ, for n large enough: ‖φ_{θ̂^LS} - f‖²_{L₂([0,1])} ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ) / n

            where C depends on β, Q, σ.

            The proof combines:

            1. The oracle inequality for LS (Theorem 3.3 + Lemma 3.13 orthogonality)
            2. The approximation error bound (Lemma 3.14): ‖f - φ_{θ*}‖² ≤ C·M^{-2β}
            3. The choice M = ⌈n^{1/(2β+1)}⌉ which balances approximation and estimation
            4. The condition β ≥ (1+√5)/4 which ensures n^{1-2β} ≤ n^{-2β/(2β+1)}
            theorem Chapter3.expectation_from_exponential_tail_bound {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {X : Ω} (hX_meas : Measurable X) (_hX_nn : ∀ (ω : Ω), 0 X ω) {A B : } (_hA : 0 A) (hB : 0 < B) (htail : ∀ (δ : ), 0 < δδ 1μ {ω : Ω | X ω A + B * Real.log (1 / δ)} ENNReal.ofReal (1 - δ)) :
            (ω : Ω), X ω μ A + B

            Standard tail integration lemma (proved via layer cake / Cavalieri).

            For a nonneg measurable random variable X on a probability space, if for all δ ∈ (0,1] we have P(X ≤ A + B·log(1/δ)) ≥ 1-δ, then E[X] ≤ A + B.

            Proof technique: E[X] = ∫₀^∞ P(X > t) dt (layer cake / Fubini) For t > A: set δ = exp(-(t-A)/B), then δ ∈ (0,1] and P(X > t) = P(X > A + B·log(1/δ)) ≤ 1 - (1-δ) = δ = exp(-(t-A)/B) So E[X] ≤ A + ∫_A^∞ exp(-(t-A)/B) dt = A + B.

            This is a standard measure theory result (cf. Billingsley, Probability and Measure). The book (line 2508) says: "The bound in expectation can be obtained by integrating the tail bound" without giving the full proof.

            theorem Chapter3.thm_3_15_uniform_high_prob {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            ∃ (C : ), 0 < C ∀ (δ : ), 0 < δδ 1μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) C * n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)

            Uniform high probability bound for the LS estimator.

            This restates the high probability bound from thm_3_15_high_prob in a form where the constant C is extracted uniformly for all δ ∈ (0,1]. The mathematical content is the same as oracle_inequality_L2_form + approx_error_L2_form + thm_3_15_rate_computation, but the constant is stated to work for all δ simultaneously (which is true in the mathematical proof since the oracle inequality constant doesn't depend on δ).

            Book lines 2496–2508: the proof of Theorem 3.15 establishes bounds with constants depending only on β, Q, σ (not on δ).

            theorem Chapter3.thm_3_15_expectation_from_tail {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x μ C * n ^ (-(2 * β) / (2 * β + 1))

            Auxiliary lemma for Theorem 3.15: Tail integration technique.

            From the high probability bound (thm_3_15_uniform_high_prob), which holds for all δ ∈ (0,1] with a uniform constant C, we can integrate the tail bound to obtain an expectation bound. The standard technique uses E[X] = ∫₀^∞ P(X > t) dt and a change of variables δ = exp(-n(t - C·rate)/(C·σ²)), yielding E[X] ≤ C'·n^{-2β/(2β+1)}. Book line 2508: "The bound in expectation can be obtained by integrating the tail bound."

            This proof combines:

            1. The uniform high probability bound thm_3_15_uniform_high_prob
            2. The standard tail integration technique expectation_from_exponential_tail_bound
            3. The fact that σ²/n ≤ n^{-2β/(2β+1)} (since σ² ≤ 1 and 1/n ≤ rate)
            theorem Chapter3.thm_3_15_expectation {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x μ C * n ^ (-(2 * β) / (2 * β + 1))

            Theorem 3.15 (Expectation bound for LS on Sobolev class).

            Under the same assumptions as the high probability bound: E[‖φ_{θ̂^LS} - f‖²_{L₂([0,1])}] ≤ C · n^{-2β/(2β+1)}

            Obtained by integrating the tail bound from the high probability result.

            theorem Chapter3.thm_3_15 {Ω : Type u_1} [MeasurableSpace Ω] {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {n : } (hn : 0 < n) (β : ) (hβ_pos : 0 < β) (hβ_lower : β (1 + 5) / 4) (Q : ) (hQ : 0 < Q) (f : ) (θstar : ) (hθstar : ∀ (j : ), θstar j = fourierCoeff f j) (hSobolev : ∀ (K : ), j : Fin K, sobolevCoeff β (j + 1) ^ 2 * θstar j ^ 2 Q) (σ : ) ( : 0 < σ) (hσ_le : σ ^ 2 1) (ε : ΩFin n) (hsubG : IsSubGaussianNoiseVec ε (σ ^ 2) μ) (Y : ΩFin n) (hModel : ∀ (ω : Ω) (i : Fin n), Y ω i = f (i / n) + ε ω i) (M : ) (hM_pos : 0 < M) (hM_le : M n - 1) (hM_choice : n ^ (1 / (2 * β + 1)) M M n ^ (1 / (2 * β + 1)) + 1) (θhat : ΩFin M) (hLS : IsLSEstimator Y θhat) (hθstar_emp : ∀ (j : Fin M), θstar j = 1 / n * i : Fin n, f (i / n) * trigBasis (j + 1) (i / n)) (δ : ) (hδ_pos : 0 < δ) (hδ_le : δ 1) (hMeas : Measurable fun (ω : Ω) => L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) (hf_meas : Measurable f) (hf_sq_int : MeasureTheory.IntegrableOn (fun (x : ) => f x ^ 2) (Set.Icc 0 1) MeasureTheory.volume) :
            (∃ (C : ), 0 < C μ {ω : Ω | (L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x) C * n ^ (-(2 * β) / (2 * β + 1)) + C * σ ^ 2 * Real.log (1 / δ) / n} ENNReal.ofReal (1 - δ)) ∃ (C : ), 0 < C (ω : Ω), L2normSq fun (x : ) => trigLinComb M (θhat ω) x - f x μ C * n ^ (-(2 * β) / (2 * β + 1))

            Theorem 3.15 (Combined statement: High probability + Expectation bounds for LS on Sobolev class).

            Book lines 2479–2486. With probability at least 1 - δ: ‖φ_{θ̂^LS} - f‖²_{L₂([0,1])} ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ) / n

            Moreover: E[‖φ_{θ̂^LS} - f‖²_{L₂([0,1])}] ≤ C · n^{-2β/(2β+1)}

            This bundles both parts of the theorem as stated in the book.