Documentation

Atlas.HighDimensionalStatistics.code.Chapter5.Def_5_1_5_2

Squared Euclidean Distance #

We define the squared ℓ₂ distance between vectors in Fin d → ℝ, which serves as the loss function throughout Chapter 5.

def Minimax.sqDist {d : } (θ₁ θ₂ : Fin d) :

Squared ℓ₂ distance between two vectors in ℝ^d: ‖θ₁ - θ₂‖₂² = Σᵢ (θ₁ i - θ₂ i)².

Instances For

    Gaussian Sequence Model (GSM, equation 5.1) #

    The Gaussian Sequence Model (5.1): Y_i = θ*_i + ε_i, i = 1, …, d where ε = (ε₁, …, ε_d)ᵀ ~ N_d(0, (σ²/n) I_d), θ* ∈ Θ ⊂ ℝ^d.

    The observation space is Fin d → ℝ (i.e., ℝ^d). For each true parameter θ* ∈ Θ, we denote by P_θ the probability distribution of Y, and by E_θ the corresponding expectation.

    The Gaussian Sequence Model (5.1).

    Parameters:

    • d : dimension of the parameter/observation space
    • n : sample size (appears in variance σ²/n)
    • σ : noise standard deviation (σ > 0)
    • Θ : the parameter set, a subset of ℝ^d
    • P : the family of probability measures {P_θ}_{θ ∈ ℝ^d} on the observation space ℝ^d. For each θ, P_θ is the distribution of Y = θ + ε where ε ~ N_d(0, (σ²/n) I_d).
    • d :

      Dimension of the parameter space

    • n :

      Sample size

    • σ :

      Noise standard deviation

    • hσ : 0 < self.σ

      Positivity of σ

    • hn : 0 < self.n

      Positivity of n

    • Θ : Set (Fin self.d)

      The parameter set Θ ⊂ ℝ^d

    • P : (Fin self.d)MeasureTheory.Measure (Fin self.d)

      For each θ ∈ ℝ^d, the probability measure P_θ on the observation space. In the GSM, P_θ is the distribution N_d(θ, (σ²/n) · I_d).

    • hP_prob (θ : Fin self.d) : MeasureTheory.IsProbabilityMeasure (self.P θ)

      Each P_θ is a probability measure (inherent to the Gaussian model).

    • hP_ac (θ₀ θ₁ : Fin self.d) : (self.P θ₀).AbsolutelyContinuous (self.P θ₁)

      Gaussian measures with the same covariance are mutually absolutely continuous.

    • hP_kl_toReal (θ₀ θ₁ : Fin self.d) : (InformationTheory.klDiv (self.P θ₁) (self.P θ₀)).toReal = self.n * sqDist θ₁ θ₀ / (2 * self.σ ^ 2)

      KL divergence formula for the Gaussian sequence model (Example 5.7): KL(P_{θ₁} ‖ P_{θ₀}) = n · ‖θ₁ − θ₀‖₂² / (2σ²). This is a defining property of the Gaussian distribution.

    • hP_kl_ne_top (θ₀ θ₁ : Fin self.d) : InformationTheory.klDiv (self.P θ₁) (self.P θ₀)

      KL divergence between Gaussian measures with the same covariance is finite.

    • hP_integrable (θhat : (Fin self.d)Fin self.d) (θ : Fin self.d) : MeasureTheory.Integrable (fun (Y : Fin self.d) => sqDist (θhat Y) θ) (self.P θ)

      In the GSM, sqDist (θhat Y) θ is integrable under P_θ for any estimator. This is a standard property of Gaussian measures (polynomial functions of Gaussians are integrable). Reference: Standard measure theory, not proved in the textbook.

    • hP_aestronglyMeasurable (θhat : (Fin self.d)Fin self.d) (θ : Fin self.d) : MeasureTheory.AEStronglyMeasurable (fun (Y : Fin self.d) => sqDist (θhat Y) θ) (self.P θ)

      In the GSM, sqDist (θhat Y) θ is AEStronglyMeasurable under P_θ for any estimator. Reference: Standard measure theory.

    • hP_measurableSet_sqDist_ge (θhat : (Fin self.d)Fin self.d) (θ : Fin self.d) (c : ) : MeasurableSet {Y : Fin self.d | sqDist (θhat Y) θ c}

      In the GSM, sets of the form {Y | sqDist (θhat Y) θ ≥ c} are measurable for any estimator θhat, parameter θ, and threshold c. This follows from measurability of the composition sqDist ∘ (θhat, θ), which holds for Borel-measurable estimators. Reference: Standard.

    • hP_bddAbove (Θ' : Set (Fin self.d)) (θhat : (Fin self.d)Fin self.d) : BddAbove (Set.range fun (θ : Fin self.d) => ⨆ (_ : θ Θ'), (Y : Fin self.d), sqDist (θhat Y) θ self.P θ)

      In the GSM, the supremum of risks is bounded above for each estimator. Reference: Standard property of Gaussian measures.

    • hMeasurable_θhat (θhat : (Fin self.d)Fin self.d) : Measurable θhat

      In the GSM, all estimators are measurable. This is an axiom that encapsulates the implicit assumption in statistics that estimators are measurable functions. Reference: Standard measure theory convention.

    Instances For

      The noise variance per coordinate in the GSM: σ²/n.

      Instances For

        Estimators #

        An estimator is a (measurable) function from the observation space ℝ^d to the parameter space ℝ^d. In the minimax framework, the infimum is taken over all such estimators.

        An estimator in the GSM: a function from observations Y ∈ ℝ^d to parameter estimates θ̂(Y) ∈ ℝ^d.

        Instances For

          Risk Functions #

          noncomputable def Minimax.risk (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (θ : Fin gsm.d) :

          The expected risk of an estimator θ̂ at parameter θ, under measure P_θ: R(θ̂, θ) = E_θ[‖θ̂(Y) - θ‖₂²].

          Instances For
            noncomputable def Minimax.supRisk (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) :

            The supremum (worst-case) risk of an estimator over the parameter set Θ: R(θ̂, Θ) = sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].

            Instances For
              noncomputable def Minimax.minimaxRisk (gsm : GaussianSequenceModel) :

              The minimax risk over the parameter set Θ: R*(Θ) = inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].

              This is the best worst-case expected squared error achievable by any estimator. It characterizes the fundamental difficulty of the estimation problem.

              Instances For

                Definition 5.1: Minimax Optimality in Expectation #

                An estimator θ̂_n is minimax optimal over Θ at rate φ(Θ) if:

                1. (Upper bound, eq. 5.2) sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ
                2. (Lower bound, eq. 5.4) inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖₂²] ≥ C' · φ

                The rate φ = φ(Θ) is called the minimax rate of estimation over Θ. Note that minimax rates are defined up to multiplicative constants.

                Definition 5.1. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the expectation sense if:

                1. (Upper bound, eq. 5.2) There exists C > 0 such that sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ

                2. (Lower bound, eq. 5.4) There exists C' > 0 such that inf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂(Y) - θ‖₂²] ≥ C' Equivalently: minimaxRisk(gsm) ≥ C' · φ.

                The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.

                Instances For

                  Definition 5.2: Minimax Optimality in High Probability #

                  This definition adapts minimax optimality to bounds that hold with high probability rather than in expectation. By the Markov inequality (eq. 5.5), a lower bound in expectation implies a lower bound in probability, so both definitions yield the same minimax rates.

                  noncomputable def Minimax.supProbLargeError (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ) :

                  For a given estimator θ̂ and threshold φ, the worst-case probability of a large squared error exceeding φ: sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).

                  Note: We use rather than > because the Section 5.2 reduction from estimation to testing (via the triangle inequality) naturally produces bounds. For Gaussian measures these are equivalent (atoms have zero probability), but makes the proof go through in the abstract framework. The resulting lower bound is also stronger (P(f ≥ c) ≥ P(f > c)).

                  Instances For

                    The minimax probability of large deviation at threshold φ: inf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).

                    This quantifies the unavoidable probability that any estimator incurs squared error at least φ for some parameter θ ∈ Θ.

                    Instances For

                      Upper bound in expectation (eq. 5.2): sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ.

                      Instances For

                        Upper bound with high probability (eq. 5.3): for all θ ∈ Θ, P_θ(‖θ̂(Y) - θ‖₂² ≥ C · φ) ≤ d⁻², equivalently ‖θ̂(Y) - θ‖₂² < C · φ with probability at least 1 - d⁻².

                        Instances For

                          Definition 5.2. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the high-probability sense if:

                          1. (Upper bound) Either eq. (5.2) or (5.3) holds.
                          2. (Lower bound, eq. 5.6) There exists C' > 0 such that inf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ) ≥ C'.

                          The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.

                          Instances For

                            Aliases for Definition 5.1 and 5.2 #

                            Definition 5.1 (Minimax Optimality in Expectation). An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) if:

                            1. (Upper bound, eq. 5.2) sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖²] ≤ C · φ
                            2. (Lower bound, eq. 5.4) inf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂ - θ‖²] ≥ C' This is an alias for IsMinimaxOptimal_Expectation.
                            Instances For

                              Definition 5.1 — The minimax optimal rate of estimation over Θ. A rate φ(Θ) is called the minimax rate if it satisfies the upper and lower bound conditions of Definition 5.1. This is an alias for IsMinimaxOptimal_Expectation.

                              Instances For

                                Definition 5.2 (Minimax Optimality in High Probability). This is an alias for IsMinimaxOptimal_HighProb.

                                Instances For