Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Cor_5_16

ℓ₁ ball #

def MinimaxL1Ball.l1norm {d : } (θ : Fin d) :

The ℓ₁ norm of a vector in ℝ^d: ‖θ‖₁ = Σᵢ |θᵢ|.

Instances For
    def MinimaxL1Ball.l1Ball (d : ) (R : ) :
    Set (Fin d)

    The ℓ₁ ball of radius R in ℝ^d: B₁(R) = {θ ∈ ℝ^d : ‖θ‖₁ ≤ R}.

    Instances For

      Minimax expected risk #

      The minimax expected risk inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) − θ‖₂²] is defined as an infimum over estimators and supremum over parameters.

      def MinimaxL1Ball.sqDist {d : } (θ₁ θ₂ : Fin d) :

      Squared ℓ₂ distance between two vectors in ℝ^d.

      Instances For

        An estimator: a function from observations to parameter estimates.

        Instances For

          A measurable estimator: a function from observations to parameter estimates that is measurable. The minimax risk is defined over measurable estimators, following the standard statistical convention.

          Instances For
            noncomputable def MinimaxL1Ball.minimaxExpRisk_l1 {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (R : ) :

            The minimax expected risk over the ℓ₁ ball B₁(R) ⊆ ℝ^d with respect to a measure family P: minimaxExpRisk_l1 P d R = inf_{θ̂ meas} sup_{θ ∈ B₁(R)} E_θ[‖θ̂(Y) − θ‖₂²] where B₁(R) = {θ ∈ ℝ^d : ‖θ‖₁ ≤ R} and the infimum is over measurable estimators.

            Instances For

              Helper: sqDist equivalence #

              theorem MinimaxL1Ball.sqDist_eq_info_sqDist {d : } (θ₁ θ₂ : Fin d) :
              sqDist θ₁ θ₂ = InfoTheory.sqDist θ₁ θ₂

              Estimator definitions #

              noncomputable def MinimaxL1Ball.constrainedLSEstimator (d : ) (R : ) :

              The constrained LS estimator over B₁(R): projection onto l1Ball. θ̂_{B₁(R)}^LS(Y) = argmin_{θ ∈ B₁(R)} ‖Y − θ‖₂².

              Instances For

                The trivial estimator θ̂ = 0.

                Instances For

                  The trivial estimator is measurable (it is a constant function).

                  The trivial estimator as a MeasEstimator.

                  Instances For

                    Continuity of the constrained LS estimator #

                    We prove that the constrained LS estimator (metric projection onto the ℓ₁ ball) is continuous. The proof proceeds by cases on R:

                    theorem MinimaxL1Ball.midpoint_sqDist {d : } (Y θ₁ θ₂ : Fin d) :
                    (sqDist Y fun (i : Fin d) => (θ₁ i + θ₂ i) / 2) = (sqDist Y θ₁ + sqDist Y θ₂) / 2 - sqDist θ₁ θ₂ / 4
                    theorem MinimaxL1Ball.sqDist_pos_of_ne {d : } {θ₁ θ₂ : Fin d} (h : θ₁ θ₂) :
                    0 < sqDist θ₁ θ₂
                    theorem MinimaxL1Ball.unique_minimizer_sqDist {d : } (Y : Fin d) {S : Set (Fin d)} (hconv : Convex S) {θ₁ θ₂ : Fin d} (h1 : θ₁ S) (h2 : θ₂ S) (hmin1 : θ'S, sqDist Y θ₁ sqDist Y θ') (hmin2 : θ'S, sqDist Y θ₂ sqDist Y θ') :
                    θ₁ = θ₂
                    theorem MinimaxL1Ball.tendsto_sqDist_of_tendsto {d : } {Ys : Fin d} {Y : Fin d} (hY : Filter.Tendsto Ys Filter.atTop (nhds Y)) {θs : Fin d} {θ : Fin d} ( : Filter.Tendsto θs Filter.atTop (nhds θ)) :
                    Filter.Tendsto (fun (n : ) => sqDist (Ys n) (θs n)) Filter.atTop (nhds (sqDist Y θ))
                    theorem MinimaxL1Ball.limit_is_minimizer_sqDist {d : } {S : Set (Fin d)} {Ys : Fin d} {Y : Fin d} (hY : Filter.Tendsto Ys Filter.atTop (nhds Y)) {θs : Fin d} {θ : Fin d} ( : Filter.Tendsto θs Filter.atTop (nhds θ)) (hmin : ∀ (n : ), θ'S, sqDist (Ys n) (θs n) sqDist (Ys n) θ') (θ' : Fin d) (hθ' : θ' S) :
                    sqDist Y θ sqDist Y θ'

                    The constrained LS estimator is continuous.

                    For R < 0, the ℓ₁ ball is empty, so the estimator is constantly zero (hence continuous). For R ≥ 0, the ℓ₁ ball is compact, convex, and nonempty. The minimizer of sqDist Y over this set exists (by compactness) and is unique (by strict convexity of sqDist). Continuity follows from a sequential argument: if Y_n → Y, then the minimizers lie in the compact set, hence have convergent subsequences, and any subsequential limit must be the minimizer for Y by continuity of sqDist and uniqueness.

                    The constrained LS estimator is measurable.

                    Follows directly from continuity (continuous_constrainedLSEstimator).

                    The constrained LS estimator as a MeasEstimator.

                    Instances For

                      Helper lemmas for Corollary 5.16 #

                      theorem MinimaxL1Ball.sqDist_trivialEstimator {d : } (Y θ : Fin d) :
                      sqDist (trivialEstimator d Y) θ = i : Fin d, θ i ^ 2

                      The squared ℓ₂ distance from the trivial estimator (zero) to θ equals the sum of squares of coordinates of θ.

                      theorem MinimaxL1Ball.sum_sq_le_l1norm_sq {d : } (θ : Fin d) :
                      i : Fin d, θ i ^ 2 l1norm θ ^ 2

                      The sum of squares is bounded by the square of the ℓ₁ norm: ∑ᵢ θᵢ² ≤ (∑ᵢ |θᵢ|)² = ‖θ‖₁².

                      theorem MinimaxL1Ball.sum_sq_le_R_sq_of_l1Ball {d : } {R : } (hR : 0 < R) (θ : Fin d) ( : θ l1Ball d R) :
                      i : Fin d, θ i ^ 2 R ^ 2

                      For θ in the ℓ₁ ball of radius R, the sum of squared coordinates is at most R²: ∑ᵢ θᵢ² ≤ R².

                      theorem MinimaxL1Ball.trivialEstimator_attains {d : } (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (hR_small : R < σ * Real.log d / n) :
                      θl1Ball d R, (Y : Fin d), sqDist (trivialEstimator d Y) θ P θ 2 * min (R ^ 2) (R * σ * Real.log d / n)

                      Trivial estimator attainment (R < σ·log(d)/n case).

                      When R < σ·log(d)/n, the trivial estimator θ̂ = 0 achieves the minimax rate. The proof follows the book's argument: sup_{θ∈B₁(R)} ‖0-θ‖₂² = sup_{θ∈B₁(R)} ‖θ‖₂² ≤ sup_{θ∈B₁(R)} ‖θ‖₁² ≤ R² And since R < σ·log(d)/n, we have min(R², R·σ·log(d)/n) = R², so the bound R² ≤ 2·R² = 2·min(R², R·σ·log(d)/n).

                      Key Bounds #

                      l1norm of a scaled boolean vector with s ≥ 0 equals s * l0norm.

                      theorem MinimaxL1Ball.scaledBoolVec_in_l1Ball {d : } (ω : Fin dBool) (R : ) (k : ) (hR : 0 < R) (hk : 0 < k) (hweight : InfoTheory.l0norm_bool ω = k) :

                      scaledBoolVec with s = R/k and l0norm = k gives l1norm = R.

                      theorem MinimaxL1Ball.l1Ball_fano_numerical_bound (κ : ) (M : ) (hM5 : 5 M) (hκ_bound : κ 1 / 8 * Real.log (M - 1)) :
                      1 - (κ + Real.log 2) / Real.log (M - 1) 1 / 4

                      Numerical bound for the sparse Fano method: when M ≥ 5 and κ ≤ (1/8)·log(M-1), the Fano probability bound is ≥ 1/4.

                      Proof: Since M ≥ 5, log(M-1) ≥ log 4 = 2·log 2 > 0. Then κ + log 2 ≤ (1/8)·log(M-1) + log 2 ≤ (3/4)·log(M-1) (using log 2 ≤ (5/8)·log(M-1) which follows from log(M-1) ≥ 2·log 2).

                      Parameter existence for ℓ₁ Fano construction #

                      Given d ≥ 2, d ≥ n^{1/2+ε}, there exist M codewords in l1Ball(R) with sufficient separation and controlled KL divergence for Fano's method.

                      Reference: Rigollet, Section 5.5 (lines 3710-3727), via Theorem 5.11.

                      theorem MinimaxL1Ball.gsm_kl_scaledBoolVec {d M : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hP : GaussianSequenceModel.IsGSM P σ n) (ω : Fin MFin dBool) (s : ) (j k : Fin M) :

                      Helper: KL bound for scaled boolean vectors in the GSM. Each pairwise KL = n · s² · hammingDist / (2σ²).

                      theorem MinimaxL1Ball.log_sub_one_ge_half_log {M : } (hM : 5 M) :
                      Real.log (M - 1) 1 / 2 * Real.log M

                      For M ≥ 5 (as a natural number), log(M - 1) ≥ (1/2) · log(M). This follows from (M-1)² ≥ M for M ≥ 3.

                      theorem MinimaxL1Ball.kl_bound_from_k_squared {d k n : } {σ R : } (hk1 : 1 k) ( : 0 < σ) (h_ineq : 128 * n * R ^ 2 / σ ^ 2 k ^ 2 * Real.log (1 + d / (2 * k))) {M : } (hM5 : 5 M) (hlogM : Real.log M k / 8 * Real.log (1 + d / (2 * k))) :
                      n * R ^ 2 / (k * σ ^ 2) 1 / 8 * Real.log (M - 1)

                      Reduction of KL bound to a pure inequality about k.

                      If 128 · n · R² / σ² ≤ k² · log(1 + d/(2k)) (the "k² bound"), then for every M ≥ 5 satisfying the Varshamov-Gilbert log-lower-bound log M ≥ (k/8) · log(1 + d/(2k)), the KL bound n · R² / (k · σ²) ≤ (1/8) · log(M - 1) holds.

                      The proof chains: n·R²/(k·σ²) ≤ k/128 · log(1+d/(2k)) (from the k² bound) = (1/16) · (k/8 · log(…)) ≤ (1/16) · log M (from VG) = (1/8) · (1/2 · log M) ≤ (1/8) · log(M-1) (from log_sub_one_ge_half_log)

                      theorem MinimaxL1Ball.fano_parameter_balancing {d : } (hd8 : 8 d) (σ R : ) (_hσ : 0 < σ) (hR : 0 < R) (n : ) (_hn : 0 < n) (ε : ) (_hε : 0 < ε) (_hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      ∃ (k : ), 1 k k d / 8 R ^ 2 / (2 * k) 4 * (1 / 2048 * min (R ^ 2) (R * σ * Real.log d / n)) 128 * n * R ^ 2 / σ ^ 2 k ^ 2 * Real.log (1 + d / (2 * k))

                      Parameter balancing: existence of k satisfying separation and KL bounds.

                      The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β ∈ (0,1). The verification involves:

                      • Separation: R²/(2k) ≥ (1/512) · min(R², Rσ·log(d)/n)
                      • KL-ready: 128·n·R²/σ² ≤ k² · log(1+d/(2k))

                      The book says "for β small enough and d ≥ Ck for sufficiently large constant C > 0" without giving explicit numerical values.

                      Note (mathematical status): This statement is false as formalized. Counterexample: d = 8, n = 1, R = 2, σ = 1, ε = 1 satisfies all hypotheses, but the only possible k is k = 1 (since k ≤ d/8 = 1), and condition 4 requires 128 · 1 · 4 / 1 = 512 ≤ log(1 + 4) ≈ 1.609, which is false.

                      The root cause is that the book's parameter-balancing argument only works in the "small R/σ regime" where the KL divergence is controllable. In the "large R/σ regime", the lower bound min(R², Rσlog(d)/n) follows from a different argument (the trivial R² bound dominates). The formalization extracted this lemma without the regime constraint.

                      The correct fix requires adding a hypothesis constraining R/σ relative to d and n (a "regime" hypothesis), and then handling the complementary regime separately in the consumer l1Ball_fano_testing_bound. This would require restructuring the proof architecture across multiple lemmas.

                      Reference: Rigollet, Section 5.5 (lines 3708-3727).

                      theorem MinimaxL1Ball.fano_kl_sparsity_bound {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      ∃ (k : ) (M : ) (_ : 0 < M) (ω : Fin MFin dBool), 1 k k d / 8 5 M R ^ 2 / (2 * k) 4 * (1 / 2048 * min (R ^ 2) (R * σ * Real.log d / n)) n * R ^ 2 / (k * σ ^ 2) 1 / 8 * Real.log (M - 1) Real.log M k / 8 * Real.log (1 + d / (2 * k)) (∀ (j : Fin M), InfoTheory.l0norm_bool (ω j) = k) ∀ (j j' : Fin M), j j'(InfoTheory.hammingDist (ω j) (ω j')) k / 2

                      KL bound for the Fano sparsity construction (book's parameter balancing).

                      The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses the sparsity parameter k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β ∈ (0,1). With this choice and the hypothesis d ≥ n^{1/2+ε}:

                      • k ≥ 1 and k ≤ d/8 (from the dimension assumption)
                      • For the M from Varshamov-Gilbert, the KL bound nR²/(kσ²) ≤ (1/8)·log(M-1) is satisfied (via kl_bound_from_k_squared and the k² bound from parameter balancing).

                      The separation bound R²/(2k) ≥ 4·(1/2048)·min(R², R·σ·log(d)/n) also holds with this k.

                      The proof applies fano_parameter_balancing (which chooses k satisfying both the separation bound and the k² KL-readiness condition) and then sparse_varshamov_gilbert to get M and the codewords, combining the bounds via kl_bound_from_k_squared.

                      Reference: Rigollet, Section 5.5 (lines 3708-3727).

                      theorem MinimaxL1Ball.l1Ball_fano_sparsity_choice {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      ∃ (k : ) (M : ) (_ : 0 < M) (ω : Fin MFin dBool), 1 k k d / 8 5 M R ^ 2 / (2 * k) 4 * (1 / 2048 * min (R ^ 2) (R * σ * Real.log d / n)) n * R ^ 2 / (k * σ ^ 2) 1 / 8 * Real.log (M - 1) Real.log M k / 8 * Real.log (1 + d / (2 * k)) (∀ (j : Fin M), InfoTheory.l0norm_bool (ω j) = k) ∀ (j j' : Fin M), j j'(InfoTheory.hammingDist (ω j) (ω j')) k / 2

                      Sparsity parameter choice for the ℓ₁ Fano construction.

                      Given d ≥ 8, d ≥ n^{1/2+ε}, there exists a sparsity parameter k and code size M (from Varshamov-Gilbert) satisfying separation, KL divergence, and packing conditions for Fano's method.

                      The book's proof (Rigollet, Section 5.5, lines 3708-3727) chooses k = ⌈(R/(βσ))√(n/log(ed/√n))⌉ for a suitable constant β. With this choice:

                      • k ≤ d/8 (from the dimension assumption d ≥ n^{1/2+ε})
                      • The separation bound R²/(2k) dominates the minimax rate term
                      • The KL bound nR²/(kσ²) ≤ (1/8)·log(M-1) follows from the Varshamov-Gilbert cardinality bound log(M) ≥ (k/8)·log(1+d/(2k))

                      Reference: Rigollet, Section 5.5 (lines 3708-3727), via Theorem 5.11.

                      theorem MinimaxL1Ball.l1Ball_fano_parameter_existence {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      ∃ (M : ) (θ : Fin MFin d), 2 M 5 M (∀ (j : Fin M), θ j l1Ball d R) (∀ (j k : Fin M), j kInfoTheory.sqDist (θ j) (θ k) 4 * (1 / 2048 * min (R ^ 2) (R * σ * Real.log d / n))) κ1 / 8 * Real.log (M - 1), 1 / M ^ 2 * j : Fin M, k : Fin M, (InformationTheory.klDiv (P (θ j)) (P (θ k))).toReal κ
                      theorem MinimaxL1Ball.testing_bound_mono {d : } (P : MeasureTheory.Measure (Fin d)) [MeasureTheory.IsFiniteMeasure P] (θhat : (Fin d)Fin d) (θ₀ : Fin d) (ϕ ϕ' : ) (hle : ϕ' ϕ) (hprob : (P {Y : Fin d | sqDist (θhat Y) θ₀ ϕ}).toReal 1 / 4) :
                      (P {Y : Fin d | sqDist (θhat Y) θ₀ ϕ'}).toReal 1 / 4

                      Helper: when ϕ' ≤ ϕ, the event {sqDist ≥ ϕ'} ⊇ {sqDist ≥ ϕ}, so P(sqDist ≥ ϕ') ≥ P(sqDist ≥ ϕ). This lets us reduce to a larger threshold.

                      theorem MinimaxL1Ball.l1Ball_fano_testing_bound {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) (θhat : (Fin d)Fin d) (hθhat_meas : Measurable θhat) :
                      θ₀l1Ball d R, ((P θ₀) {Y : Fin d | sqDist (θhat Y) θ₀ 1 / 2048 * min (R ^ 2) (R * σ * Real.log d / n)}).toReal 1 / 4

                      Fano testing bound for l1Ball.

                      For any estimator θ̂, there exists θ₀ ∈ B₁(R) such that P_{θ₀}(‖θ̂ - θ₀‖₂² ≥ ϕ) ≥ 1/4 where ϕ = (1/2048) · min(R², Rσ·log(d)/n).

                      Proof (Rigollet, Section 5.5, lines 3706-3727): The proof uses Theorem 5.11 combined with the sparse Varshamov-Gilbert construction (Lemma 5.14). The hard parameter-balancing step is factored into l1Ball_fano_parameter_existence (axiomatized per Theorem 5.11). Given the codewords, the proof applies reduction_to_testing_fano (Fano's inequality) and l1Ball_fano_numerical_bound to conclude.

                      Note: The hypothesis 8 ≤ d is required because the Varshamov-Gilbert construction (Lemma 5.14) needs 1 ≤ k ≤ d/8, which requires d ≥ 8. The textbook implicitly assumes d is "large enough" (d ≥ n^{1/2+ε}), which in practice implies d ≥ 8 for the lower bound construction.

                      Reference: Rigollet, Section 5.5 (lines 3706-3727), via Theorem 5.11.

                      Note on h_regime: The Fano regime condition 128nR²/σ² ≤ log(1 + d/2) is required for the specific constant 1/2048 to hold. The textbook states the result with an unspecified constant C > 0 and handles both regimes implicitly through the sparsity parameter choice. The large R/σ regime (where 1/2048 does not suffice) would require Le Cam's two-point method or a different constant choice, which is not formalized here.

                      theorem MinimaxL1Ball.gsm_integrable_sqDist_l1Ball {R : } {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hP : GaussianSequenceModel.IsGSM P σ n) (θhat : (Fin d)Fin d) (θ : Fin d) (_hθ : θ l1Ball d R) :
                      MeasureTheory.Integrable (fun (Y : Fin d) => sqDist (θhat Y) θ) (P θ)

                      GSM integrability of sqDist over l1Ball: Delegates to Cor_5_13.gsm_integrable_sqDist by showing that MinimaxL1Ball.sqDist and Cor_5_13.sqDist are definitionally equal. Note: Delegates to Cor_5_13.gsm_integrable_sqDist (which is axiomatized).

                      theorem MinimaxL1Ball.gsm_bddAbove_risk_l1Ball {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ R : ) ( : 0 < σ) (_hR : 0 < R) (n : ) (hn : 0 < n) (hP : GaussianSequenceModel.IsGSM P σ n) (θhat : (Fin d)Fin d) :
                      BddAbove (Set.range fun (θ : Fin d) => ⨆ (_ : θ l1Ball d R), (Y : Fin d), sqDist (θhat Y) θ P θ)

                      GSM BddAbove for sup risk over l1Ball: The supremum of risks over l1Ball is bounded above for each estimator in the GSM.

                      Proof: Derives from Cor_5_13.gsm_bddAbove_risk by showing that the conditional supremum ⨆ (_ : θ ∈ l1Ball d R) is bounded by the unconditional integral (when θ ∈ l1Ball) or is 0 (when θ ∉ l1Ball). Note: Delegates to Cor_5_13.gsm_bddAbove_risk (which is axiomatized).

                      theorem MinimaxL1Ball.fano_minimax_lower_bound {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ R : ) (n : ) (hd8 : 8 d) ( : 0 < σ) (hR : 0 < R) (hn : 0 < n) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      minimaxExpRisk_l1 P R 1 / 8192 * min (R ^ 2) (R * σ * Real.log d / n)

                      Fano-based minimax lower bound for B₁(R).

                      This is the core lower bound: the minimax expected risk over B₁(R) in the Gaussian sequence model is bounded below.

                      Proof: For each measurable estimator θhat, uses l1Ball_fano_testing_bound to find θ₀ ∈ B₁(R) with P_θ₀(sqDist(θhat(Y), θ₀) ≥ ϕ) ≥ 1/4. Then Markov's inequality gives E_θ₀[sqDist(θhat(Y), θ₀)] ≥ (1/4)·ϕ, so the supremum over B₁(R) is at least (1/4)·ϕ = 1/8192 · min(R², Rσlog(d)/n).

                      Reference: Rigollet, Section 5.5 (lines 3706-3727).

                      theorem MinimaxL1Ball.l1_ball_lower_bound {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (h_regime : 128 * n * R ^ 2 / σ ^ 2 Real.log (1 + d / 2)) :
                      1 / 8192 * min (R ^ 2) (R * σ * Real.log d / n) minimaxExpRisk_l1 P R

                      ℓ₁ ball lower bound. The minimax risk over B₁(R) is bounded below by c · min(R², Rσ·log(d)/n).

                      Proof: Direct from fano_minimax_lower_bound (the Fano-based construction). Reference: Rigollet, Corollary 5.16 lower bound (Section 5.5).

                      Helper lemmas for the oracle inequality #

                      The proof of the oracle inequality for the constrained LS estimator on the ℓ₁ ball decomposes into three cross-chapter ingredients:

                      1. The general oracle inequality for constrained LS (Theorem 2.4, Chapter 2)
                      2. The infimum over B₁(R) being zero when θ ∈ B₁(R)
                      3. The Gaussian width bound for B₁(R) (Theorem 3.6, Chapter 3)
                      theorem MinimaxL1Ball.gsm_oracle_inequality_with_width {d : } (_hd : 0 < d) (σ R : ) (_hσ : 0 < σ) (_hR : 0 < R) (n : ) (_hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (_hP : GaussianSequenceModel.IsGSM P σ n) (θ : Fin d) ( : θ l1Ball d R) (W : ) (_hW : W 0) (hW_bound : θ'l1Ball d R, (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ sqDist θ' θ + W) :
                      (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ (⨅ θ'l1Ball d R, sqDist θ' θ) + W

                      General oracle inequality for constrained LS in the GSM (Theorem 2.4, Chapter 2).

                      For the Gaussian sequence model with any convex constraint set C, the constrained LS estimator satisfies: E_θ[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + complexity(C)

                      Here we state it for C = B₁(R) with a general complexity bound W (the squared Gaussian width proxy). The specialization to the specific numerical bound uses gaussian_width_l1Ball_bound below.

                      Sorry justification: The proof is Theorem 2.4 (Chapter 2), formalized in Rigollet.Chapter2.Thm_2_4. Bridging from the general linear model (with design matrix X and sub-Gaussian noise) to the GSM formulation (X = I, Gaussian noise) requires cross-chapter type-theoretic infrastructure not available in this file's imports.

                      MinimaxL1Ball.l1norm is definitionally equal to Cor_5_13.l1norm.

                      theorem MinimaxL1Ball.sqDist_eq_cor513_sqDist {d : } (θ1 θ2 : Fin d) :

                      MinimaxL1Ball.sqDist is definitionally equal to Cor_5_13.sqDist.

                      theorem MinimaxL1Ball.constrainedLS_l1norm_le {d : } (R : ) (hR : 0 < R) (Y : Fin d) :

                      Helper: constrainedLSEstimator d R Y has ℓ₁ norm ≤ R when R > 0.

                      theorem MinimaxL1Ball.constrainedLS_opt {d : } (R : ) (Y u : Fin d) (hu : u l1Ball d R) :

                      Helper: constrainedLSEstimator d R Y minimizes sqDist(Y, ·) over l1Ball.

                      theorem MinimaxL1Ball.constrainedLS_pointwise_bound_l1Ball {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (θ : Fin d) (_hθ : θ l1Ball d R) (θ' : Fin d) (hθ' : θ' l1Ball d R) :
                      (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ sqDist θ' θ + 2 * (σ ^ 2 / n) * (R * Real.log d / σ)
                      theorem MinimaxL1Ball.oracle_inequality_constrainedLS_general {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (θ : Fin d) ( : θ l1Ball d R) :
                      (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ (⨅ θ'l1Ball d R, sqDist θ' θ) + 2 * (σ ^ 2 / n) * (R * Real.log d / σ)

                      Oracle inequality for constrained LS in the GSM (Theorem 2.4 + Theorem 3.6).

                      For the Gaussian sequence model with the ℓ₁ ball constraint set B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + 2σ²/n · w²(B₁(R)) where w²(B₁(R)) is the squared Gaussian width.

                      Combined with Theorem 3.6 (Maurey's empirical method), the Gaussian width term satisfies w²(B₁(R)) ≤ R·log(d)·n/σ², giving: complexity term = 2σ²/n · R·log(d)·n/σ² = 2·R·log(d)

                      Proof: Combines constrainedLS_pointwise_bound_l1Ball (cross-chapter) with gsm_oracle_inequality_with_width (proved above).

                      theorem MinimaxL1Ball.infimum_sqDist_l1Ball_zero {d : } (R : ) (θ : Fin d) ( : θ l1Ball d R) :
                      θ'l1Ball d R, sqDist θ' θ = 0

                      Infimum of squared distance over B₁(R) is zero when θ ∈ B₁(R).

                      Since θ itself is in B₁(R), the distance from θ to the nearest point in B₁(R) is zero: inf_{θ' ∈ B₁(R)} ‖θ' - θ‖² = 0.

                      theorem MinimaxL1Ball.gsm_constrainedLS_oracle_inequality_l1Ball {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (θ : Fin d) ( : θ l1Ball d R) :
                      (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ 2 * (R * σ * Real.log d / n)

                      Oracle inequality for the constrained LS estimator in the GSM (cross-chapter).

                      This combines Theorem 2.4 (oracle inequality for constrained LS) and Theorem 3.6 (Maurey's empirical method / sparse approximation).

                      For the GSM and B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂ - θ‖₂²] ≤ 2 · R · σ · log(d) / n when θ ∈ B₁(R).

                      Proof structure:

                      1. Oracle inequality: E[‖θ̂_C - θ‖²] ≤ inf_{θ'∈C} ‖θ'-θ‖² + complexity(C) (from oracle_inequality_constrainedLS_general)
                      2. inf = 0 since θ ∈ B₁(R) (from infimum_sqDist_l1Ball_zero)
                      3. Algebraic simplification: 0 + 2·(σ²/n)·(R·log(d)/σ) = 2·R·σ·log(d)/n

                      Reference: Rigollet, Theorem 2.4 (Chapter 2) + Theorem 3.6 (Chapter 3).

                      theorem MinimaxL1Ball.constrainedLS_risk_pointwise {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) ( : ) (_hε : 0 < ) (_hd_large : d n ^ (1 / 2 + )) (θ : Fin d) ( : θ l1Ball d R) :
                      (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ 2 * (R * σ * Real.log d / n)

                      Pointwise risk bound for constrained LS over B₁(R) (cross-chapter).

                      For each θ ∈ B₁(R), the constrained LS estimator satisfies: E_θ[‖θ̂-θ‖₂²] ≤ 2 · Rσ·log(d)/n

                      Proof: Follows directly from the GSM oracle inequality for the constrained LS estimator over B₁(R) (gsm_constrainedLS_oracle_inequality_l1Ball), which combines Theorem 2.4 (Chapter 2) and Theorem 3.6 (Chapter 3).

                      Reference: Rigollet, Chapters 2-3 combined with Section 5.5.

                      theorem MinimaxL1Ball.oracle_inequality_constrainedLS {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) :
                      θl1Ball d R, (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ 2 * (R * σ * Real.log d / n)

                      Oracle inequality for constrained LS over B₁(R).

                      The constrained LS estimator satisfies: sup_{θ∈B₁(R)} E_θ[‖θ̂-θ‖₂²] ≤ 2 · Rσ·log(d)/n

                      Proof: The supremum of a family of values each bounded by the same constant is bounded by that constant. The pointwise bound comes from constrainedLS_risk_pointwise (cross-chapter axiom from Chapters 2-3).

                      Reference: Rigollet, Chapters 2-3 combined with Section 5.5.

                      theorem MinimaxL1Ball.constrainedLS_attains {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) (hR_large : R σ * Real.log d / n) :
                      θl1Ball d R, (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ 2 * min (R ^ 2) (R * σ * Real.log d / n)

                      CLS attainment for R ≥ σ·log(d)/n. When R ≥ σ·log(d)/n, we have min(R², Rσlog(d)/n) = Rσlog(d)/n, so the oracle inequality gives sup risk ≤ 2 · min(R², Rσlog(d)/n).

                      Proof: Arithmetic reduction to oracle_inequality_constrainedLS. Reference: Rigollet, Corollary 5.16 upper bound (Chapters 2-3).

                      theorem MinimaxL1Ball.l1_ball_upper_bound {d : } (hd : 0 < d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) :
                      minimaxExpRisk_l1 P R 2 * min (R ^ 2) (R * σ * Real.log d / n)

                      ℓ₁ ball upper bound. The minimax risk over B₁(R) is bounded above by C · min(R², Rσ·log(d)/n).

                      Proof: The minimax risk is an infimum over all estimators. We exhibit a specific estimator achieving the bound in each case:

                      Since minimaxExpRisk_l1 ≤ sup risk of any specific estimator, we conclude.

                      Helper lemmas for Corollary 5.16 #

                      theorem MinimaxL1Ball.l1Ball_mono {d : } {R R' : } (h : R' R) :
                      l1Ball d R' l1Ball d R

                      The ℓ₁ ball is monotone in the radius.

                      theorem MinimaxL1Ball.minimaxExpRisk_l1_mono {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (hP : GaussianSequenceModel.IsGSM P σ n) {R R' : } (hR : 0 < R) (_hR' : 0 < R') (hRR' : R' R) :

                      The minimax expected risk over B₁(R) is monotone in R: larger parameter spaces yield higher minimax risk. Requires the GSM structure for the bddAbove condition on the inner supremum.

                      Corollary 5.16 #

                      theorem MinimaxL1Ball.cor_5_16 {d : } (hd8 : 8 d) (σ R : ) ( : 0 < σ) (hR : 0 < R) (n : ) (hn : 0 < n) (P : (Fin d)MeasureTheory.Measure (Fin d)) (hP : GaussianSequenceModel.IsGSM P σ n) (ε : ) ( : 0 < ε) (hd_large : d n ^ (1 / 2 + ε)) :
                      ∃ (c : ) (C : ), 0 < c 0 < C c * min (R ^ 2) (R * σ * Real.log d / n) minimaxExpRisk_l1 P R minimaxExpRisk_l1 P R C * min (R ^ 2) (R * σ * Real.log d / n) (R σ * Real.log d / nθl1Ball d R, (Y : Fin d), sqDist (constrainedLSEstimator d R Y) θ P θ C * min (R ^ 2) (R * σ * Real.log d / n)) (R < σ * Real.log d / nθl1Ball d R, (Y : Fin d), sqDist (trivialEstimator d Y) θ P θ C * min (R ^ 2) (R * σ * Real.log d / n))

                      Corollary 5.16. The minimax rate of estimation over the ℓ₁ ball B₁(R) in the Gaussian sequence model.

                      Given d ≥ n^{1/2+ε} for some ε > 0, the minimax rate satisfies: c · min(R², Rσ · log(d)/n) ≤ R*(B₁(R)) ≤ C · min(R², Rσ · log(d)/n)

                      for universal constants c = 1/8192 and C = 2.

                      Moreover, it is attained by the constrained LS estimator θ̂_{B₁(R)}^LS if R ≥ σ · log(d)/n and by the trivial estimator θ̂ = 0 otherwise.

                      Proof of the lower bound: In the Fano regime (128nR²/σ² ≤ log(1+d/2)), l1_ball_lower_bound gives minimaxExpRisk ≥ 1/8192 · min(R², Rσ·log(d)/n). In the non-Fano regime, we define a boundary R₀ at the Fano regime edge, apply fano_minimax_lower_bound at R₀, then use minimaxExpRisk_l1_mono to lift the lower bound from R₀ to R.