Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Thm_5_11

Minimax risk: probability that squared error exceeds a threshold #

The minimax risk inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ) is defined as an infimum over estimators and supremum over parameters.

def MinimaxLowerBound.sqDist {d : } (θ₁ θ₂ : Fin d) :

The squared Euclidean distance between two vectors in ℝᵈ.

Instances For

    An estimator: a measurable function from observations to parameter estimates.

    Instances For
      noncomputable def MinimaxLowerBound.minimaxRisk {d : } (P : (Fin d)MeasureTheory.Measure (Fin d)) (Θ : Set (Fin d)) (ϕ : ) :

      Minimax testing risk over a parameter set Θ ⊆ ℝᵈ with respect to a measure family P: minimaxRisk P Θ ϕ = inf_{θ̂ measurable} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ).

      The outer infimum is over all measurable estimators θ̂ : ℝᵈ → ℝᵈ and the inner supremum is over all parameters θ ∈ Θ. The value P_θ(|θ̂ − θ|₂² ≥ ϕ) is the probability (under P_θ) that the squared error exceeds the threshold ϕ.

      Instances For
        theorem MinimaxLowerBound.sqDist_eq_infoTheory {d : } (θ₁ θ₂ : Fin d) :
        sqDist θ₁ θ₂ = InfoTheory.sqDist θ₁ θ₂

        The local sqDist equals InfoTheory.sqDist (definitionally equal).

        theorem MinimaxLowerBound.fano_algebraic_bound {M : } (hM : 5 M) {α : } (hα_pos : 0 < α) :
        (α * Real.log M + Real.log 2) / Real.log (M - 1) 2 * α + 1 / 2

        Key algebraic lemma: for M ≥ 5 and α > 0, (α · log M + log 2) / log (M - 1) ≤ 2α + 1/2.

        This follows from:

        • log M ≤ 2 · log(M-1) for M ≥ 5 (since M ≤ (M-1)²)
        • log 2 ≤ (1/2) · log(M-1) for M ≥ 5 (since M-1 ≥ 4 = 2²)
        theorem MinimaxLowerBound.theorem_5_11 {d M : } (hM : 5 M) {Θ : Set (Fin d)} (P : (Fin d)MeasureTheory.Measure (Fin d)) (θ : Fin MFin d) (hθ_mem : ∀ (j : Fin M), θ j Θ) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (α : ) (hα_pos : 0 < α) (_hα_lt : α < 1 / 4) (ϕ : ) ( : 0 < ϕ) (hsep : ∀ (j k : Fin M), j ksqDist (θ j) (θ k) 4 * ϕ) (hkl : ∀ (j k : Fin M), j ksqDist (θ j) (θ k) 2 * α * σ ^ 2 / n * Real.log M) [hprob : ∀ (j : Fin M), MeasureTheory.IsProbabilityMeasure (P (θ j))] (hP_prob : θ'Θ, MeasureTheory.IsProbabilityMeasure (P θ')) (hac : ∀ (j k : Fin M), (P (θ j)).AbsolutelyContinuous (P (θ k))) (hfin_kl : ∀ (j k : Fin M), InformationTheory.klDiv (P (θ j)) (P (θ k)) ) (hGSM : ∀ (j k : Fin M), (InformationTheory.klDiv (P (θ j)) (P (θ k))).toReal = n * InfoTheory.sqDist (θ j) (θ k) / (2 * σ ^ 2)) :
        minimaxRisk P Θ ϕ 1 / 2 - 2 * α

        Theorem 5.11 (Many-hypothesis minimax lower bound via Fano).

        Assume that Θ contains M ≥ 5 hypotheses θ₁, …, θ_M such that for some constant 0 < α < 1/4: (i) sqDist(θⱼ, θₖ) ≥ 4ϕ for all j ≠ k (separation condition) (ii) sqDist(θⱼ, θₖ) ≤ (2ασ²/n)·log(M) for all j ≠ k (KL control)

        Then: inf_{θ̂} sup_{θ ∈ Θ} P_θ(|θ̂ − θ|₂² ≥ ϕ) ≥ 1/2 − 2α

        The proof applies Fano's inequality (Theorem 5.10) after bounding the average pairwise KL divergence using condition (ii) and the Gaussian KL formula KL(P_j ∥ P_k) = n·|θⱼ − θₖ|₂²/(2σ²).

        Note: The textbook implicitly assumes the Gaussian sequence model (GSM), where each P_θ is a probability measure, measures are mutually absolutely continuous, and KL(P_θ₁ ∥ P_θ₂) = n|θ₁−θ₂|²/(2σ²). These standard statistical assumptions are made explicit as hypotheses hprob, hac, and hGSM (following the axiom structure in InfoTheory.lean).

        Application: Sparse Estimation Lower Bound #

        Using the Sparse Varshamov-Gilbert Lemma (Lemma 5.14) together with Theorem 5.11, we obtain the minimax lower bound for estimating k-sparse vectors in the Gaussian sequence model.

        noncomputable def MinimaxLowerBound.l0norm {d : } (θ : Fin d) :

        The ℓ₀ "norm" (number of nonzero entries) of a vector in ℝᵈ.

        Instances For

          The set of k-sparse vectors in ℝᵈ: B₀(k) = {θ ∈ ℝᵈ : |θ|₀ ≤ k}.

          Instances For
            theorem MinimaxLowerBound.hammingDist_le_two_weight {d k : } (ω₁ ω₂ : Fin dBool) (hw₁ : InfoTheory.l0norm_bool ω₁ = k) (hw₂ : InfoTheory.l0norm_bool ω₂ = k) :
            InfoTheory.hammingDist ω₁ ω₂ 2 * k
            theorem MinimaxLowerBound.sqDist_scaled_indicator {d M : } (ω : Fin MFin dBool) (scale : ) (j k' : Fin M) :
            (sqDist (fun (i : Fin d) => if ω j i = true then scale else 0) fun (i : Fin d) => if ω k' i = true then scale else 0) = scale ^ 2 * (InfoTheory.hammingDist (ω j) (ω k'))
            theorem MinimaxLowerBound.l0norm_scaled_indicator {d : } (ω : Fin dBool) (scale : ) (hscale : scale 0) :
            (l0norm fun (i : Fin d) => if ω i = true then scale else 0) = InfoTheory.l0norm_bool ω
            theorem MinimaxLowerBound.sparse_estimation_lower_bound {d k : } (hk_pos : 1 k) (hk_le : 8 * k d) (P : (Fin d)MeasureTheory.Measure (Fin d)) (σ : ) ( : 0 < σ) (n : ) (hn : 0 < n) (α : ) (hα_pos : 0 < α) (hα_lt : α < 1 / 4) [hP_prob : ∀ (θ' : Fin d), MeasureTheory.IsProbabilityMeasure (P θ')] (hac : ∀ (θ₁ θ₂ : Fin d), (P θ₁).AbsolutelyContinuous (P θ₂)) (hGSM_kl : ∀ (θ₁ θ₂ : Fin d), (InformationTheory.klDiv (P θ₁) (P θ₂)).toReal = n * InfoTheory.sqDist θ₁ θ₂ / (2 * σ ^ 2)) (hfin_kl : ∀ (θ₁ θ₂ : Fin d), InformationTheory.klDiv (P θ₁) (P θ₂) ) :
            ∃ (C : ), 0 < C minimaxRisk P (sparseSet d k) (C * σ ^ 2 * k * Real.log (1 + d / (2 * k)) / n) 1 / 2 - 2 * α

            Sparse Estimation Lower Bound (Application of Theorem 5.11 + Sparse Varshamov-Gilbert).

            For k-sparse vectors in ℝᵈ with 1 ≤ k ≤ d/8, there exist positive constants such that: inf_{θ̂} sup_{θ : |θ|₀ ≤ k} P_θ(|θ̂ − θ|₂² ≥ C·σ²k·log(1 + d/(2k))/n) ≥ 1/2 − 2α

            Note: The textbook implicitly works in the GSM. The hypotheses hP_prob, hac, and hGSM_kl make these assumptions explicit, matching the approach in theorem_5_11.