ℓ₁ ball #
Minimax expected risk #
The minimax expected risk inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) − θ‖₂²] is
defined as an infimum over estimators and supremum over parameters.
An estimator: a function from observations to parameter estimates.
Instances For
A measurable estimator: a function from observations to parameter estimates that is measurable. The minimax risk is defined over measurable estimators, following the standard statistical convention.
Instances For
The minimax expected risk over the ℓ₁ ball B₁(R) ⊆ ℝ^d with respect
to a measure family P:
minimaxExpRisk_l1 P d R = inf_{θ̂ meas} sup_{θ ∈ B₁(R)} E_θ[‖θ̂(Y) − θ‖₂²]
where B₁(R) = {θ ∈ ℝ^d : ‖θ‖₁ ≤ R} and the infimum is over measurable estimators.
Instances For
Helper: sqDist equivalence #
Estimator definitions #
The constrained LS estimator over B₁(R): projection onto l1Ball. θ̂_{B₁(R)}^LS(Y) = argmin_{θ ∈ B₁(R)} ‖Y − θ‖₂².
Instances For
The trivial estimator θ̂ = 0.
Instances For
The trivial estimator is measurable (it is a constant function).
The trivial estimator as a MeasEstimator.
Instances For
Continuity of the constrained LS estimator #
We prove that the constrained LS estimator (metric projection onto the ℓ₁ ball) is continuous. The proof proceeds by cases on R:
- For R < 0, the ℓ₁ ball is empty, so the estimator is constantly zero.
- For R ≥ 0, we use compactness of the ℓ₁ ball, uniqueness of the minimizer (strict convexity of squared distance), and a sequential continuity argument.
The constrained LS estimator is continuous.
For R < 0, the ℓ₁ ball is empty, so the estimator is constantly zero (hence continuous). For R ≥ 0, the ℓ₁ ball is compact, convex, and nonempty. The minimizer of sqDist Y over this set exists (by compactness) and is unique (by strict convexity of sqDist). Continuity follows from a sequential argument: if Y_n → Y, then the minimizers lie in the compact set, hence have convergent subsequences, and any subsequential limit must be the minimizer for Y by continuity of sqDist and uniqueness.
The constrained LS estimator is measurable.
Follows directly from continuity (continuous_constrainedLSEstimator).
The constrained LS estimator as a MeasEstimator.
Instances For
Helper lemmas for Corollary 5.16 #
The squared ℓ₂ distance from the trivial estimator (zero) to θ equals the sum of squares of coordinates of θ.
Trivial estimator attainment (R < σ·log(d)/n case).
When R < σ·log(d)/n, the trivial estimator θ̂ = 0 achieves the minimax rate. The proof follows the book's argument: sup_{θ∈B₁(R)} ‖0-θ‖₂² = sup_{θ∈B₁(R)} ‖θ‖₂² ≤ sup_{θ∈B₁(R)} ‖θ‖₁² ≤ R² And since R < σ·log(d)/n, we have min(R², R·σ·log(d)/n) = R², so the bound R² ≤ 2·R² = 2·min(R², R·σ·log(d)/n).
Key Bounds #
scaledBoolVec with s = R/k and l0norm = k gives l1norm = R.
Numerical bound for the sparse Fano method: when M ≥ 5 and κ ≤ (1/8)·log(M-1), the Fano probability bound is ≥ 1/4.
Proof: Since M ≥ 5, log(M-1) ≥ log 4 = 2·log 2 > 0. Then κ + log 2 ≤ (1/8)·log(M-1) + log 2 ≤ (3/4)·log(M-1) (using log 2 ≤ (5/8)·log(M-1) which follows from log(M-1) ≥ 2·log 2).
Parameter existence for ℓ₁ Fano construction #
Given d ≥ 2, d ≥ n^{1/2+ε}, there exist M codewords in l1Ball(R) with sufficient separation and controlled KL divergence for Fano's method.
Reference: Rigollet, Section 5.5 (lines 3710-3727), via Theorem 5.11.
Helper: KL bound for scaled boolean vectors in the GSM. Each pairwise KL = n · s² · hammingDist / (2σ²).
Reduction of KL bound to a pure inequality about k.
If 128 · n · R² / σ² ≤ k² · log(1 + d/(2k)) (the "k² bound"), then
for every M ≥ 5 satisfying the Varshamov-Gilbert log-lower-bound
log M ≥ (k/8) · log(1 + d/(2k)), the KL bound
n · R² / (k · σ²) ≤ (1/8) · log(M - 1) holds.
The proof chains:
n·R²/(k·σ²) ≤ k/128 · log(1+d/(2k)) (from the k² bound)
= (1/16) · (k/8 · log(…))
≤ (1/16) · log M (from VG)
= (1/8) · (1/2 · log M)
≤ (1/8) · log(M-1) (from log_sub_one_ge_half_log)
Parameter balancing: existence of k satisfying separation and KL bounds.
The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β ∈ (0,1). The verification involves:
- Separation: R²/(2k) ≥ (1/512) · min(R², Rσ·log(d)/n)
- KL-ready: 128·n·R²/σ² ≤ k² · log(1+d/(2k))
The book says "for β small enough and d ≥ Ck for sufficiently large constant C > 0" without giving explicit numerical values.
Note (mathematical status): This statement is false as formalized. Counterexample: d = 8, n = 1, R = 2, σ = 1, ε = 1 satisfies all hypotheses, but the only possible k is k = 1 (since k ≤ d/8 = 1), and condition 4 requires 128 · 1 · 4 / 1 = 512 ≤ log(1 + 4) ≈ 1.609, which is false.
The root cause is that the book's parameter-balancing argument only works in the "small R/σ regime" where the KL divergence is controllable. In the "large R/σ regime", the lower bound min(R², Rσlog(d)/n) follows from a different argument (the trivial R² bound dominates). The formalization extracted this lemma without the regime constraint.
The correct fix requires adding a hypothesis constraining R/σ relative
to d and n (a "regime" hypothesis), and then handling the complementary
regime separately in the consumer l1Ball_fano_testing_bound. This would
require restructuring the proof architecture across multiple lemmas.
Reference: Rigollet, Section 5.5 (lines 3708-3727).
KL bound for the Fano sparsity construction (book's parameter balancing).
The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses the sparsity parameter k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β ∈ (0,1). With this choice and the hypothesis d ≥ n^{1/2+ε}:
- k ≥ 1 and k ≤ d/8 (from the dimension assumption)
- For the M from Varshamov-Gilbert, the KL bound nR²/(kσ²) ≤ (1/8)·log(M-1)
is satisfied (via
kl_bound_from_k_squaredand the k² bound from parameter balancing).
The separation bound R²/(2k) ≥ 4·(1/2048)·min(R², R·σ·log(d)/n) also holds with this k.
The proof applies fano_parameter_balancing (which chooses k satisfying both
the separation bound and the k² KL-readiness condition) and then
sparse_varshamov_gilbert to get M and the codewords, combining the bounds
via kl_bound_from_k_squared.
Reference: Rigollet, Section 5.5 (lines 3708-3727).
Sparsity parameter choice for the ℓ₁ Fano construction.
Given d ≥ 8, d ≥ n^{1/2+ε}, there exists a sparsity parameter k and code size M (from Varshamov-Gilbert) satisfying separation, KL divergence, and packing conditions for Fano's method.
The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β. With this choice:
- k ≤ d/8 (from the dimension assumption d ≥ n^{1/2+ε})
- The separation bound R²/(2k) dominates the minimax rate term
- The KL bound nR²/(kσ²) ≤ (1/8)·log(M-1) follows from the Varshamov-Gilbert cardinality bound log(M) ≥ (k/8)·log(1+d/(2k))
Reference: Rigollet, Section 5.5 (lines 3708-3727), via Theorem 5.11.
Helper: when ϕ' ≤ ϕ, the event {sqDist ≥ ϕ'} ⊇ {sqDist ≥ ϕ}, so P(sqDist ≥ ϕ') ≥ P(sqDist ≥ ϕ). This lets us reduce to a larger threshold.
Fano testing bound for l1Ball.
For any estimator θ̂, there exists θ₀ ∈ B₁(R) such that P_{θ₀}(‖θ̂ - θ₀‖₂² ≥ ϕ) ≥ 1/4 where ϕ = (1/2048) · min(R², Rσ·log(d)/n).
Proof (Rigollet, Section 5.5, lines 3706-3727):
The proof uses Theorem 5.11 combined with the sparse Varshamov-Gilbert
construction (Lemma 5.14). The hard parameter-balancing step is factored
into l1Ball_fano_parameter_existence (axiomatized per Theorem 5.11).
Given the codewords, the proof applies reduction_to_testing_fano (Fano's
inequality) and l1Ball_fano_numerical_bound to conclude.
Note: The hypothesis 8 ≤ d is required because the Varshamov-Gilbert
construction (Lemma 5.14) needs 1 ≤ k ≤ d/8, which requires d ≥ 8.
The textbook implicitly assumes d is "large enough" (d ≥ n^{1/2+ε}),
which in practice implies d ≥ 8 for the lower bound construction.
Reference: Rigollet, Section 5.5 (lines 3706-3727), via Theorem 5.11.
Note on h_regime: The Fano regime condition 128nR²/σ² ≤ log(1 + d/2) is
required for the specific constant 1/2048 to hold. The textbook states the
result with an unspecified constant C > 0 and handles both regimes implicitly
through the sparsity parameter choice. The large R/σ regime (where 1/2048 does
not suffice) would require Le Cam's two-point method or a different constant
choice, which is not formalized here.
GSM integrability of sqDist over l1Ball: Delegates to Cor_5_13.gsm_integrable_sqDist
by showing that MinimaxL1Ball.sqDist and Cor_5_13.sqDist are definitionally equal.
Note: Delegates to Cor_5_13.gsm_integrable_sqDist (which is axiomatized).
GSM BddAbove for sup risk over l1Ball: The supremum of risks over l1Ball is bounded above for each estimator in the GSM.
Proof: Derives from Cor_5_13.gsm_bddAbove_risk by showing that
the conditional supremum ⨆ (_ : θ ∈ l1Ball d R) is bounded by
the unconditional integral (when θ ∈ l1Ball) or is 0 (when θ ∉ l1Ball).
Note: Delegates to Cor_5_13.gsm_bddAbove_risk (which is axiomatized).
Fano-based minimax lower bound for B₁(R).
This is the core lower bound: the minimax expected risk over B₁(R) in the Gaussian sequence model is bounded below.
Proof: For each measurable estimator θhat, uses l1Ball_fano_testing_bound
to find θ₀ ∈ B₁(R) with P_θ₀(sqDist(θhat(Y), θ₀) ≥ ϕ) ≥ 1/4. Then Markov's
inequality gives E_θ₀[sqDist(θhat(Y), θ₀)] ≥ (1/4)·ϕ, so the supremum over
B₁(R) is at least (1/4)·ϕ = 1/8192 · min(R², Rσlog(d)/n).
Reference: Rigollet, Section 5.5 (lines 3706-3727).
ℓ₁ ball lower bound. The minimax risk over B₁(R) is bounded below by c · min(R², Rσ·log(d)/n).
Proof: Direct from fano_minimax_lower_bound (the Fano-based construction).
Reference: Rigollet, Corollary 5.16 lower bound (Section 5.5).
Helper lemmas for the oracle inequality #
The proof of the oracle inequality for the constrained LS estimator on the ℓ₁ ball decomposes into three cross-chapter ingredients:
- The general oracle inequality for constrained LS (Theorem 2.4, Chapter 2)
- The infimum over B₁(R) being zero when θ ∈ B₁(R)
- The Gaussian width bound for B₁(R) (Theorem 3.6, Chapter 3)
General oracle inequality for constrained LS in the GSM (Theorem 2.4, Chapter 2).
For the Gaussian sequence model with any convex constraint set C, the constrained LS estimator satisfies: E_θ[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + complexity(C)
Here we state it for C = B₁(R) with a general complexity bound W
(the squared Gaussian width proxy). The specialization to the specific
numerical bound uses gaussian_width_l1Ball_bound below.
Sorry justification: The proof is Theorem 2.4 (Chapter 2), formalized
in Rigollet.Chapter2.Thm_2_4. Bridging from the general linear model
(with design matrix X and sub-Gaussian noise) to the GSM formulation
(X = I, Gaussian noise) requires cross-chapter type-theoretic infrastructure
not available in this file's imports.
MinimaxL1Ball.l1norm is definitionally equal to Cor_5_13.l1norm.
MinimaxL1Ball.sqDist is definitionally equal to Cor_5_13.sqDist.
Helper: constrainedLSEstimator d R Y has ℓ₁ norm ≤ R when R > 0.
Helper: constrainedLSEstimator d R Y minimizes sqDist(Y, ·) over l1Ball.
Oracle inequality for constrained LS in the GSM (Theorem 2.4 + Theorem 3.6).
For the Gaussian sequence model with the ℓ₁ ball constraint set B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + 2σ²/n · w²(B₁(R)) where w²(B₁(R)) is the squared Gaussian width.
Combined with Theorem 3.6 (Maurey's empirical method), the Gaussian width term satisfies w²(B₁(R)) ≤ R·log(d)·n/σ², giving: complexity term = 2σ²/n · R·log(d)·n/σ² = 2·R·log(d)
Proof: Combines constrainedLS_pointwise_bound_l1Ball (cross-chapter)
with gsm_oracle_inequality_with_width (proved above).
Oracle inequality for the constrained LS estimator in the GSM (cross-chapter).
This combines Theorem 2.4 (oracle inequality for constrained LS) and Theorem 3.6 (Maurey's empirical method / sparse approximation).
For the GSM and B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂ - θ‖₂²] ≤ 2 · R · σ · log(d) / n when θ ∈ B₁(R).
Proof structure:
- Oracle inequality: E[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + complexity(C)
(from
oracle_inequality_constrainedLS_general) - inf = 0 since θ ∈ B₁(R) (from
infimum_sqDist_l1Ball_zero) - Algebraic simplification: 0 + 2·(σ²/n)·(R·log(d)/σ) = 2·R·σ·log(d)/n
Reference: Rigollet, Theorem 2.4 (Chapter 2) + Theorem 3.6 (Chapter 3).
Pointwise risk bound for constrained LS over B₁(R) (cross-chapter).
For each θ ∈ B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂-θ‖₂²] ≤ 2 · Rσ·log(d)/n
Proof: Follows directly from the GSM oracle inequality for the constrained
LS estimator over B₁(R) (gsm_constrainedLS_oracle_inequality_l1Ball), which
combines Theorem 2.4 (Chapter 2) and Theorem 3.6 (Chapter 3).
Reference: Rigollet, Chapters 2-3 combined with Section 5.5.
Oracle inequality for constrained LS over B₁(R).
The constrained LS estimator satisfies: sup_{θ∈B₁(R)} E_θ[‖θ̂-θ‖₂²] ≤ 2 · Rσ·log(d)/n
Proof: The supremum of a family of values each bounded by the same constant
is bounded by that constant. The pointwise bound comes from
constrainedLS_risk_pointwise (cross-chapter axiom from Chapters 2-3).
Reference: Rigollet, Chapters 2-3 combined with Section 5.5.
CLS attainment for R ≥ σ·log(d)/n. When R ≥ σ·log(d)/n, we have min(R², Rσlog(d)/n) = Rσlog(d)/n, so the oracle inequality gives sup risk ≤ 2 · min(R², Rσlog(d)/n).
Proof: Arithmetic reduction to oracle_inequality_constrainedLS.
Reference: Rigollet, Corollary 5.16 upper bound (Chapters 2-3).
ℓ₁ ball upper bound. The minimax risk over B₁(R) is bounded above by C · min(R², Rσ·log(d)/n).
Proof: The minimax risk is an infimum over all estimators. We exhibit a specific estimator achieving the bound in each case:
- Case R ≥ σ·log(d)/n: use constrainedLSEstimator (by
constrainedLS_attains) - Case R < σ·log(d)/n: use trivialEstimator (by
trivialEstimator_attains)
Since minimaxExpRisk_l1 ≤ sup risk of any specific estimator, we conclude.
Helper lemmas for Corollary 5.16 #
The minimax expected risk over B₁(R) is monotone in R: larger parameter spaces yield higher minimax risk. Requires the GSM structure for the bddAbove condition on the inner supremum.
Corollary 5.16 #
Corollary 5.16. The minimax rate of estimation over the ℓ₁ ball B₁(R) in the Gaussian sequence model.
Given d ≥ n^{1/2+ε} for some ε > 0, the minimax rate satisfies:
c · min(R², Rσ · log(d)/n) ≤ R*(B₁(R)) ≤ C · min(R², Rσ · log(d)/n)
for universal constants c = 1/8192 and C = 2.
Moreover, it is attained by the constrained LS estimator θ̂_{B₁(R)}^LS
if R ≥ σ · log(d)/n and by the trivial estimator θ̂ = 0 otherwise.
Proof of the lower bound: In the Fano regime (128nR²/σ² ≤ log(1+d/2)),
l1_ball_lower_bound gives minimaxExpRisk ≥ 1/8192 · min(R², Rσ·log(d)/n).
In the non-Fano regime, we define a boundary R₀ at the Fano regime edge,
apply fano_minimax_lower_bound at R₀, then use minimaxExpRisk_l1_mono
to lift the lower bound from R₀ to R.