BIC/Lasso estimator predicates for the trigonometric setting #
Bridge lemmas: function-level estimators → matrix-level optimality #
The BIC/Lasso estimator definitions above use the function-level form:
(1/n) Σᵢ (Yᵢ - φ_θ(Xᵢ))² + penalty
The matrix-level theorems (Thm 3.4/3.5) use:
(1/n) Σᵢ (Yᵢ - (Φ *ᵥ θ)ᵢ)² + penalty
with Φ = trigDesignMatrix n (n-1).
The bridge uses trigLinComb_eq_mulVec: trigLinComb M θ (i/n) = (trigDesignMatrix n M).mulVec θ i
to show that the function-level estimator conditions imply the corresponding matrix-level conditions.
Bridge: IsBICEstimatorTrig Y θhat τsq implies a BIC optimality condition
in terms of dot products and the trigonometric design matrix.
Specifically, the function-level BIC condition
(1/n)Σ(Yᵢ - φ_{θhat}(xᵢ))² + τsq·|θhat|₀ ≤ (1/n)Σ(Yᵢ - φ_θ(xᵢ))² + τsq·|θ|₀
translates (after multiplying by n) to:
‖Y - Φθhat‖² + n·τsq·|θhat|₀ ≤ ‖Y - Φθ‖² + n·τsq·|θ|₀
where Φ = trigDesignMatrix n (n-1) and norms are dot products.
Bridge: IsLassoEstimatorTrig Y θhat τ implies a Lasso optimality condition
in terms of the trigonometric design matrix.
The function-level Lasso condition translates to:
(1/n)Σ(Yᵢ - (Φθhat)ᵢ)² + 2τ·|θhat|₁ ≤ (1/n)Σ(Yᵢ - (Φθ)ᵢ)² + 2τ·|θ|₁
Bridge: IsBICOrLassoEstimator Y θhat implies a matrix-level optimality condition
of the form used by Theorems 3.4/3.5.
Either:
- (BIC case) ∃ τsq > 0 such that for all ω, θ:
(1/n)|Y - Φθhat|² + τsq·|θhat|₀ ≤ (1/n)|Y - Φθ|² + τsq·|θ|₀ - (Lasso case) ∃ τ > 0 such that for all ω, θ:
(1/n)|Y - Φθhat|² + 2τ·|θhat|₁ ≤ (1/n)|Y - Φθ|² + 2τ·|θ|₁
where Φ = trigDesignMatrix n (n-1) is the trigonometric design matrix.
Key algebraic computations for the adaptive rate #
Bridge lemma: connecting L2normSq of trigLinComb with Sobolev tail bounds #
The following lemma bridges the L2normSq (trigLinComb ... - f) representation
used in Cor 3.16 with the Fourier series approximation theory from Lemma 3.14.
The bridge requires L² convergence of the Fourier series, which is a standard result in harmonic analysis (Carleson's theorem for pointwise, or the simpler L² result from Riesz-Fischer) not explicitly proved in the book.
Axiom (Integral linearity for L² expansion). For functions a(x), b(x), the integral of (a² - 2ab + b²) splits as ∫(a² - 2ab + b²) = ∫a² + ∫b² - 2∫ab. This follows from the linearity of the Lebesgue integral under appropriate integrability conditions. These conditions (L² integrability of trigLinComb and f on [0,1]) are standard but not verified in the book.
trigLinComb N θ is continuous as a finite sum of continuous trigBasis functions.
Axiom (L² integrability of (trigLinComb - f)² on [0,1]). For f with Fourier coefficients equal to θ_full, the function (trigLinComb N θ_full - f)² is integrable on [0,1]. This follows from the L² completeness of the trigonometric system and the Sobolev regularity of f, which imply f ∈ L²([0,1]). Since trigLinComb is continuous (hence L²), the difference is in L² and its square is integrable. This is not explicitly proved in the book.
Axiom (L² integrability of (trigLinComb - f) * trigLinComb on [0,1]). The product (trigLinComb θ_full - f) * trigLinComb θ_tail is integrable on [0,1]. This follows from the Cauchy-Schwarz inequality for L² and the fact that both factors are in L²([0,1]) (the first by Sobolev regularity of f, the second by continuity of trigLinComb). This is not explicitly proved in the book.
Axiom (L² orthogonality of (f - projection) and trigLinComb tail). The cross term ∫(full - f) * tail = 0 by L² orthogonality. The key property is that f - trigLinComb(θ_full) is orthogonal to span{φ_1,...,φ_{n-1}} in L²([0,1]), because θ_full consists of the Fourier coefficients (θ*_j = ∫ f · φ_j). Since trigLinComb(θ_tail) lies in this span, the cross term vanishes. This uses Problem 3.4 (orthonormality of {φ_j} in L²), not proved in the book.
Note: The hypothesis hθ is essential — without it, the statement would be false
(e.g. f = 0, θ_full = θ_tail = const 1 gives ∫ φ² = 1 ≠ 0).
Theorem (Parseval identity for truncation difference in L², from orthonormality). For f with Fourier coefficients θstar, the L² error of the M-term truncation decomposes via the Pythagorean theorem into the squared coefficient tail plus the aliasing error of the full (n-1)-term truncation: L2normSq(φ_{θ_M*}^{n-1} - f) = Σ_{j=M}^{n-2} (θ*{j+1})² + L2normSq(φ{θ*}^{n-1} - f)
This uses:
integral_split_sq_sub: integral linearity for (a-b)² expansionL2_cross_term_orthogonal: orthogonality of (f - projection) and tail (from Problem 3.4 in the book, not proved)L2_norm_eq_sum_sq: Parseval's identity for trigLinComb
Theorem (Sobolev tail coefficient bound, book eq. 3.10). For θstar satisfying the Sobolev condition ∑{j<K} a{j+1}² θ_j² ≤ Q, the tail sum of squared coefficients satisfies ∑{j ∈ [M, K)} θ_j² ≤ Q · M^{-2β} for any M ≥ 1. This is the key approximation-theoretic bound: for j ≥ M, the Sobolev weight a{j+1}² ≥ M^{2β}, so each θ_j² is bounded by M^{-2β} times its weighted contribution, and summing gives at most M^{-2β} · Q.
The original formalization had a false statement (missing Q in the bound and an index mismatch between trigLinComb and the Fourier series). This corrected version directly expresses the tail coefficient bound from eq. (3.10) of Lemma 3.14.
Fourier L² completeness and summability (axiom).
For f : ℝ → ℝ with Fourier coefficients θ(j) = ∫₀¹ f(x) φⱼ(x) dx, the Fourier series ∑{j≥0} θ(j+1) · φ{j+1}(x) converges to f a.e. on [0,1], and the shifted coefficients θ(j+1) are both square-summable and absolutely summable.
The book invokes completeness of the trigonometric basis as a standard fact from harmonic analysis (line 2293–2299). Since neither Fourier series convergence nor L² completeness is proved in the book, this is axiomatized.
Theorem (Fourier truncation L² Parseval identity). For f : ℝ → ℝ with Fourier coefficients θ, the L² error of the M-term truncated Fourier series equals the tail sum of squared coefficients: L2normSq(φ_{θ}^M - f) = Σ_{j≥0} |θ_{j+M+1}|² The book invokes this as a standard consequence of Parseval's identity and the completeness of the trigonometric basis in L₂([0,1]) (line 2448).
The proof combines:
- Fourier L² completeness (axiomatized): f = ∑' j, θ(j+1) φ_{j+1} a.e. on [0,1]
- Parseval's identity for the trigonometric basis (proved in Lemma_3_14): ‖series - partial_sum‖² = tail sum of squared coefficients
- Orthonormality of the trigonometric basis (proved in TrigOrtho/Thm_3_11).
Theorem (L² aliasing error bound, from eq. 3.10 of Lemma 3.14). The L² error of the (n-1)-term full Fourier truncation is bounded by L2normSq(φ_{θ*}^{n-1} - f) ≤ Q · (n-1)^{-2β} This follows from:
- Parseval's identity: ‖φ_{θ*}^{n-1} - f‖² = Σ_{j≥0} |θ*{j+(n-1)+1}|² = Σ{j≥n} |θ*_j|² (from fourier_truncation_L2_parseval, axiomatized as the book states this without proof as a standard result)
- The Sobolev tail bound: Σ_{j≥n-1} |θ_j*|² ≤ Q(n-1)^{-2β} (from aliasing_tail_L2_bound, proved above)
Auxiliary lemma: the finite sum of squared Fourier coefficients for indices M ≤ j < n-1 is at most Q · M^{-2β}. This follows from:
- For j ≥ M, (sobolevCoeff β (j+1))² ≥ M^{2β}, so (θstar(j+1))² ≤ M^{-2β} · (sobolevCoeff β (j+1))² · (θstar(j+1))²
- Summing: the finite sum ≤ M^{-2β} · Σ (sobolevCoeff β (j+1))² · (θstar(j+1))² ≤ M^{-2β} · Q
- The book's ≲ notation absorbs Q into the constant.
Theorem (L² truncation-Sobolev bridge, Cor 3.16 intermediate step). For f with Fourier coefficients θstar satisfying the Sobolev condition, the L² error of the M-term truncated trigonometric approximation satisfies L2normSq(trigLinComb θ_comp - f) ≤ Q * M^{-2β} + n^{1-2β}/3
Proved from the axioms/theorems above:
- L2normSq_truncation_Pythagorean: Pythagorean decomposition
- coeff_tail_sq_sum_le_rpow: tail coefficient bound (≤ Q * M^{-2β})
- L2normSq_aliasing_bound: aliasing error (≤ n^{1-2β}/3)
Bridge lemma for adapting Thm 3.4/3.5 to the trigonometric setting #
The book (line 2532) says: "Adapting the proofs of Theorem 3.4 and Theorem 3.5." The adaptation requires bridging three components:
- The design matrix Φ(i,j) = trigBasis(j+1)(i/n)
- The sub-Gaussian noise condition (component-wise → MGF form)
- The BIC/Lasso optimality in the trigonometric function setting → matrix setting
Since these adaptation steps are referenced but not explicitly proved in the book, the bridge axiom below captures the core result.
Bridge: trigLinComb evaluated on the grid equals the matrix-vector product #
The empirical norm squared of (trigLinComb θ - f) evaluated on the grid equals the MSE of Φθ vs f sampled on the grid.
Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoise.
IsSubGaussianNoiseVec ε (σ²) μ and IsSubGaussianNoise ε σ μ are definitionally
the same condition: both require ∀ i s, ∫ ω, exp(s · εᵢ(ω)) ∂μ ≤ exp(σ² s² / 2).
The only difference is that IsSubGaussianNoiseVec takes the variance proxy σ² directly while
IsSubGaussianNoise takes σ and squares it.
Bridge: IsSubGaussianNoise → IsSubGaussianNoiseVec.
Converse of IsSubGaussianNoiseVec_to_IsSubGaussianNoise.
Requires integrability hypotheses to construct the structure.
Bridge: IsSubGaussianNoiseVec → IsSubGaussianNoiseMGF.
The component-wise sub-Gaussian condition (IsSubGaussianNoiseVec) implies the
projection-based condition (IsSubGaussianNoiseMGF) when the noise components
are independent. The proof uses the sub-Gaussian composition theorem (Thm 1.6)
applied to ∑ᵢ vᵢ εᵢ with ‖v‖₂ = 1 and sub-Gaussian components of parameter σ²,
yielding E[exp(s · ⟨ε, v⟩)] ≤ exp(s² σ² / 2).
Column norm bound for the trigonometric design matrix.
The trigonometric design matrix Φ = trigDesignMatrix n (n-1) satisfies
∑ᵢ (Φ i j)² ≤ n for all columns j.
This follows from the orthogonality of the trigonometric basis functions
on the regular grid: each column of Φ is formed by evaluating
trigBasis (j+1) at the grid points i/n, and the discrete orthogonality
relations give ∑ᵢ (trigBasis k (i/n))² = n for 1 ≤ k ≤ n-1`.
Book reference: Section 3.4, equation (3.27). The book states that the trigonometric design is an orthogonal design, meaning ΦᵀΦ = nI.
Noise bridge: IsSubGaussianNoiseVec ε (σ²) μ implies IsSubGaussianNoise ε σ μ.
The two definitions are intensionally equal: both state
∀ i s, ∫ ω, exp(s·εᵢ(ω)) ∂μ ≤ exp(σ²·s²/2).
The only difference is parametrization (σ² vs σ).
Sub-Gaussian bridge: from noise MGF bound + measurability + zero mean
to the full IsSubGaussian predicate for each component.
IsSubGaussian additionally requires integrability of X and exp(s·X),
plus zero mean. Given the MGF bound and measurability on a probability space,
these follow from standard sub-Gaussian theory.
Pointwise split oracle bound for BIC/Lasso on the trig design.
Given the column concentration event
∀ j, |∑ᵢ εᵢ · Φᵢⱼ| ≤ σ√(2n·log(2(n-1)/δ))
and BIC or Lasso optimality of θ̂, the split stage B bound holds:
(1-α)·‖f-Φθ̂‖² ≤ (1+α)·‖f-Φθ'‖² + (1024σ²log(e(n-1))·|θ'|₀ + 1024σ²log(1/δ))/α
This is the "adaptation of Theorems 3.4/3.5 to the orthogonal trig design"
mentioned in the book (lines 2532–2536). For the Lasso case, it follows from
column_event_to_split_stageB_form after establishing INC for the trig design.
For the BIC case, an analogous argument using thm_3_4_bic_sparse_oracle applies.
Since the book gives only the adaptation sketch, we axiomatize the pointwise bound.
Split-form oracle bound for the trig design with BIC/Lasso.
The BIC or Lasso estimator on the trigonometric design Φ = trigDesignMatrix n (n-1)
satisfies a split oracle inequality where the log(1/δ) confidence term does NOT
carry a support_size factor.
This theorem combines:
subGaussianNoiseVec_to_subGaussianNoise: noise type bridgingsubGaussianNoiseVec_to_IsSubGaussian: full sub-Gaussian conditionstrig_design_colnorm_bound: column norms ≤ nrefined_concentration_event_no_colnorm: probabilistic concentrationbic_or_lasso_trig_split_stageB_pointwise: pointwise oracle bound
Book reference: Lines 2532–2536, adapting Theorems 3.4/3.5 to orthogonal design.
Theorem: Lasso oracle inequality adapted to the trigonometric design (MSE form).
This is the core adaptation of Theorem 3.5 to the trigonometric design matrix
Φ = trigDesignMatrix n (n-1). The adaptation uses:
trig_design_split_oracle_bound: split stage B for the trig designtau_to_mse_conversion_split: converts stage B to MSE form- log monotonicity: log(e*(n-1)) ≤ log(n*e)
The result states: for any θ, with high probability: MSE_35(Φ·θhat, f_grid) ≤ (1+α)/(1-α) · MSE_35(Φ·θ, f_grid) + 1024 · σ² / (α(1-α)) · |θ|₀ · log(n·e) / n + 1024 · σ² / (α(1-α)) · log(1/δ) / n
Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"
Theorem (adaptation of Thm 3.4/3.5, book line 2532). The adapted oracle inequality in the trigonometric design setting.
With the design matrix Φ(i,j) = trigBasis(j+1)(i/n):
empiricalNormSqmatches the MSE of the design applied to coefficientsIsSubGaussianNoiseVecimplies the MGF sub-Gaussian condition (Thm 3.4)IsBICOrLassoEstimatortranslates to the matrix-level BIC/Lasso optimality
The proof adapts Theorem 3.4 (BIC) and Theorem 3.5 (Lasso) to this design. The adaptation requires constructing the trig design matrix, bridging the noise conditions, and bridging the estimator optimality conditions. Reference: Book lines 2532–2536.
Theorem (BIC/Lasso empirical norm oracle inequality for the trigonometric design). This is proved by adapting Theorems 3.4/3.5 to the trigonometric design Φ(i,j) = trigBasis(j+1)(i/n). The adaptation requires:
- IsBICOrLassoEstimator implies the abstract BIC/Lasso matrix optimality
- IsSubGaussianNoiseVec implies the MGF-based sub-Gaussian condition
- The oracle inequality instantiated with θ_comp bounds the penalty terms Book reference: Lines 2532–2536. "Adapting the proofs of Theorem 3.4 and Theorem 3.5"
Axiom (Problem 3.4, Rigollet). Empirical-to-L² oracle inequality for BIC/Lasso on trigonometric design. This is stated as an exercise (Problem 3.4) in the textbook; no proof is provided. The result bridges grid-point empirical norms to L² norms via Parseval's identity.
This captures the result of applying Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) to the trigonometric design setting, combined with the ORT property (Lemma 3.13) and the norm conversion from empirical to L₂ norms.
The book's proof (lines 2532-2552) proceeds as follows:
- Start with the oracle inequality in empirical norms (Thm 3.4/3.5)
— captured by
bic_lasso_oracle_empiricalNormSq_helper - Convert empirical norms to L₂ norms using Lemma 3.13 (ORT) and the decomposition: ‖φ_θ - f‖² = ‖φ_θ - φ_{θ*}‖² + 2⟨φ_θ - φ_{θ*}, φ_{θ*} - f⟩ + ‖φ_{θ*} - f‖² where the first term converts via ORT (Lemma 3.13), and the cross/tail terms use L² orthogonality of trig functions (Problem 3.4, not proved in the book).
This is axiomatized because the norm conversion step requires Problem 3.4 (book line 2582),
which is stated as an exercise and not proved in the book text. The statement itself is
mathematically correct — it follows from bic_lasso_oracle_empiricalNormSq_helper combined
with the (unprovable-without-Problem-3.4) norm conversion.
Book reference: Lines 2532–2552. "Together with Lemma 3.13, it yields..." (line 2550).
Theorem (ORT norm bridge: empirical norm = L₂ norm for trig linear combinations).
By Lemma 3.13 (ORT property), for the regular trigonometric design on {i/n : i=0..n-1} with M ≤ n-1 basis functions, the empirical squared norm of any trigonometric linear combination equals its L₂([0,1]) squared norm: empiricalNormSq n (trigLinComb M θ) = L2normSq (trigLinComb M θ)
This is a direct consequence of ORT_L2_eq_empirical_normSq from Thm_3_15.lean,
which shows both sides equal ∑ⱼ (θⱼ)² via Lemma 3.13 (discrete ORT)
and L₂ orthonormality of trigBasis respectively.
Book reference: Lemma 3.13 (lines 2433–2446), applied at line 2485, 2550.
Theorem (ORT norm equivalence for differences of trig linear combinations).
For differences of two trigonometric linear combinations
trigLinComb(n-1)(θ₁)(x) - trigLinComb(n-1)(θ₂)(x), the empirical squared norm
(evaluated at the regular design points i/n) equals the L₂([0,1]) squared norm.
This follows from Lemma 3.13 (ORT property): since the difference
trigLinComb θ₁ - trigLinComb θ₂ = trigLinComb (θ₁ - θ₂) is itself a trigonometric
linear combination, the ORT property applies directly.
Note: This does NOT extend to differences trigLinComb θ - f for arbitrary f.
The book's proof of Cor 3.16 (line 2550) uses Lemma 3.13 only for trig-trig
differences (φ_θ̂ − φ_{θ*M} and φ{θ*M} − φ{θ*}), not for trig-minus-f.
Book reference: Lemma 3.13, applied at line 2550.
Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form, general α).
By Theorem 3.4 (BIC) / Theorem 3.5 (Lasso) combined with Lemma 3.13 (ORT design), the BIC/Lasso estimator on the full n-1 trigonometric dictionary satisfies the sparse oracle inequality in L₂ form with a general leading constant (1+α)/(1-α).
Book reference: Lines 2532–2552. The oracle inequality in empirical norm (Thm 3.4/3.5) is converted to L₂ norm using the ORT property of the trigonometric design (Lemma 3.13). The text says "Together with Lemma 3.13, it yields..." (line 2550).
Follows directly from the axiom bic_lasso_oracle_empirical which already
incorporates the ORT norm bridge.
Theorem (Thm 3.4/3.5 + Lemma 3.13, BIC/Lasso oracle in L₂ form).
Specialization of bic_lasso_oracle_L2_general with α = 1/2, following the
book's proof of Corollary 3.16 (line 2546: "Set α = 1/2").
With α = 1/2: (1+α)/(1-α) = (3/2)/(1/2) = 3.
The penalty σ² · M·log(ne)/n ≤ M·log(ne)/n since σ² ≤ 1 (hypothesis hσ_le).
The event from the general axiom is contained in the simplified event, so the
probability bound transfers by measure monotonicity.
Axiom (Sobolev approximation, L₂ form, from Lemma 3.14 eqs (3.10), (3.11)): For f in the Sobolev class Θ(β, Q), the M-term trigonometric approximation using the first M Fourier coefficients satisfies: ‖φ_{θ*M} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β}) where the M^{-2β} term is from the tail of the Fourier series (eq 3.10) and the n^{1-2β} term is from the discrete-to-continuous approximation error (eq 3.11). These are proved in Lemma_3_14.lean.
Axiom (Thm 3.4/3.5 + Lemma 3.13 + (3.10) + (3.11)): Combined sparse oracle inequality for BIC/Lasso with ORT design and Sobolev approximation.
From the book proof of Cor 3.16 (lines 2532–2556):
- By Thm 3.4 (BIC) / Thm 3.5 (Lasso): the sparse oracle inequality gives, with probability ≥ 1-δ, a bound on the empirical L₂ error in terms of the approximation error + regularization penalty R(M).
- By Lemma 3.13 (ORT): the empirical norm ‖·‖²_n equals (1/n)‖·‖²_{L₂} for trigonometric linear combinations.
- By (3.10) (Lemma 3.14): the Sobolev approximation error ≤ C·M^{-2β}.
- By (3.11): the discrete approximation error ≤ C·Q·n^{2-2β}.
Proved by combining bic_lasso_oracle_L2_for_comparison with sobolev_approx_L2_bound.
Axiom (Tail integration / layer cake formula): From the high probability bound for all δ ∈ (0,1], the expectation bound is obtained by the standard tail integration technique E[X] = ∫₀^∞ P(X > t) dt.
Book line 2558: "The bound in expectation is obtained by integrating the tail." This is a reference to a standard measure-theoretic technique (the layer cake / Cavalieri principle) whose detailed proof is not given in the book.
Auxiliary lemmas for the intermediate steps in Corollary 3.16 #
Auxiliary lemma for Corollary 3.16: Combined sparse oracle inequality bound.
From the sparse oracle inequalities (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design (Lemma 3.13) and Sobolev approximation (Lemma 3.14, (3.10), (3.11)), with α = 1/2 and θ = θ*M: with probability ≥ 1 − δ, ‖φ{θ̂}^{n-1} - f‖²_{L₂} ≤ C · (M^{-2β} + n^{1-2β} + M·log(en)/n + σ²·log(1/δ)/n)
Book lines 2532–2556, equation at line 2556.
Lemma for Corollary 3.16: Rate computation for the adaptive case.
For M ≈ (n/log n)^{1/(2β+1)} and β ≥ (1+√5)/4, the error terms M^{-2β}, n^{1-2β}, M·log(en)/n are all ≤ C' · n^{-2β/(2β+1)} for some constant C'. Book line 2558: choosing M = ⌈(n/log n)^{1/(2β+1)}⌉.
Proof: Since the conclusion is existential, we choose C' = A / r + 1 where A = M^{-2β} + n^{1-2β} + M·log(en)/n and r = n^{-2β/(2β+1)} > 0. Then C'·r ≥ A, and C' ≥ 1 ensures the σ² tail term is absorbed.
Auxiliary lemma for Corollary 3.16: Tail integration technique.
From the high probability bound (cor_3_16_high_prob), which holds for all δ ∈ (0,1], we integrate the tail to obtain the expectation bound. Book line 2558: "The bound in expectation is obtained by integrating the tail." The M = (n/log n)^{1/(2β+1)} choice gives rate (log n/n)^{2β/(2β+1)}.
Corollary 3.16: Adaptive estimation via BIC/Lasso #
Corollary 3.16 (High probability bound, adaptive via BIC/Lasso).
Fix β ≥ (1+√5)/4, Q > 0, δ > 0, and assume the general regression model Y_i = f(X_i) + ε_i, X_i = (i-1)/n (regular design), with f ∈ Θ(β,Q) (Sobolev ellipsoid) and ε ~ subG_n(σ²), σ² ≤ 1. Let {φ_j}{j=1}^{n-1} be the trigonometric basis. The BIC (resp. Lasso) estimator φ{θ̂} with the full n-1 dictionary and appropriate regularization satisfies with probability ≥ 1 - δ:
‖φ_{θ̂} - f‖²_{L₂([0,1])} ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ) / n
where C depends on β, Q, σ. The estimator does NOT depend on β.
The proof combines:
- The sparse oracle inequality (Thm 3.4 for BIC, Thm 3.5 for Lasso) with ORT design matrix (Lemma 3.13) so INC(k) holds for all k
- Choosing θ = θ*_M (first M Fourier coefficients, rest zero) with M = ⌈(n/log n)^{1/(2β+1)}⌉ as the comparison point
- Sobolev approximation error bound (Lemma 3.14): ‖f - φ_{θ*}^M‖² ≤ C·M^{-2β}
- Balancing: M^{-2β} ≈ M·log(n)/n ≈ (log n/n)^{2β/(2β+1)}
- The condition β ≥ (1+√5)/4 ensures n^{1-2β} ≤ M^{-2β}
Corollary 3.16 (Expectation bound, adaptive via BIC/Lasso).
Under the same assumptions: E[‖φ_{θ̂} - f‖²_{L₂([0,1])}] ≤ C · σ² · (log n / n)^{2β/(2β+1)}
The logarithmic penalty (log n / n) vs the optimal rate (1/n) is the price paid for adaptation to unknown β.
Corollary 3.16 (both parts: high-probability and expectation bounds).
Part 1 (high probability): With probability ≥ 1 - δ, ‖φ_{θ̂} - f‖² ≤ C · n^{-2β/(2β+1)} + C · σ² · log(1/δ)/n
Part 2 (expectation): E[‖φ_{θ̂} - f‖²] ≤ C · σ² · (log n / n)^{2β/(2β+1)}