Atlas.HighDimensionalStatistics.code.Chapter5.Def_5_1_5

Squared Euclidean Distance #

We define the squared ℓ₂ distance between vectors in Fin d → ℝ, which serves as the loss function throughout Chapter 5.

source

def Minimax.sqDist {d : ℕ} (θ₁ θ₂ : Fin d → ℝ) :

ℝ

Squared ℓ₂ distance between two vectors in ℝ^d: ‖θ₁ - θ₂‖₂² = Σᵢ (θ₁ i - θ₂ i)².

Instances For

Gaussian Sequence Model (GSM, equation 5.1) #

The Gaussian Sequence Model (5.1): Y_i = θ*_i + ε_i, i = 1, …, d where ε = (ε₁, …, ε_d)ᵀ ~ N_d(0, (σ²/n) I_d), θ* ∈ Θ ⊂ ℝ^d.

The observation space is Fin d → ℝ (i.e., ℝ^d). For each true parameter θ* ∈ Θ, we denote by P_θ the probability distribution of Y, and by E_θ the corresponding expectation.

source

structure Minimax.GaussianSequenceModel :

Type

The Gaussian Sequence Model (5.1).

Parameters:

d : dimension of the parameter/observation space
n : sample size (appears in variance σ²/n)
σ : noise standard deviation (σ > 0)
Θ : the parameter set, a subset of ℝ^d
P : the family of probability measures {P_θ}_{θ ∈ ℝ^d} on the observation space ℝ^d. For each θ, P_θ is the distribution of Y = θ + ε where ε ~ N_d(0, (σ²/n) I_d).

d : ℕ
Dimension of the parameter space
n : ℕ
Sample size
σ : ℝ
Noise standard deviation
hσ : 0 < self.σ
Positivity of σ
hn : 0 < self.n
Positivity of n
Θ : Set (Fin self.d → ℝ)
The parameter set Θ ⊂ ℝ^d
P : (Fin self.d → ℝ) → MeasureTheory.Measure (Fin self.d → ℝ)
For each θ ∈ ℝ^d, the probability measure P_θ on the observation space. In the GSM, P_θ is the distribution N_d(θ, (σ²/n) · I_d).
hP_prob (θ : Fin self.d → ℝ) : MeasureTheory.IsProbabilityMeasure (self.P θ)
Each P_θ is a probability measure (inherent to the Gaussian model).
hP_ac (θ₀ θ₁ : Fin self.d → ℝ) : (self.P θ₀).AbsolutelyContinuous (self.P θ₁)
Gaussian measures with the same covariance are mutually absolutely continuous.
hP_kl_toReal (θ₀ θ₁ : Fin self.d → ℝ) : (InformationTheory.klDiv (self.P θ₁) (self.P θ₀)).toReal = ↑self.n * sqDist θ₁ θ₀ / (2 * self.σ ^ 2)
KL divergence formula for the Gaussian sequence model (Example 5.7): KL(P_{θ₁} ‖ P_{θ₀}) = n · ‖θ₁ − θ₀‖₂² / (2σ²). This is a defining property of the Gaussian distribution.
hP_kl_ne_top (θ₀ θ₁ : Fin self.d → ℝ) : InformationTheory.klDiv (self.P θ₁) (self.P θ₀) ≠ ⊤
KL divergence between Gaussian measures with the same covariance is finite.
hP_integrable (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) : MeasureTheory.Integrable (fun (Y : Fin self.d → ℝ) => sqDist (θhat Y) θ) (self.P θ)
In the GSM, sqDist (θhat Y) θ is integrable under P_θ for any estimator. This is a standard property of Gaussian measures (polynomial functions of Gaussians are integrable). Reference: Standard measure theory, not proved in the textbook.
hP_aestronglyMeasurable (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) : MeasureTheory.AEStronglyMeasurable (fun (Y : Fin self.d → ℝ) => sqDist (θhat Y) θ) (self.P θ)
In the GSM, sqDist (θhat Y) θ is AEStronglyMeasurable under P_θ for any estimator. Reference: Standard measure theory.
hP_measurableSet_sqDist_ge (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) (c : ℝ) : MeasurableSet {Y : Fin self.d → ℝ | sqDist (θhat Y) θ ≥ c}
In the GSM, sets of the form {Y | sqDist (θhat Y) θ ≥ c} are measurable for any estimator θhat, parameter θ, and threshold c. This follows from measurability of the composition sqDist ∘ (θhat, θ), which holds for Borel-measurable estimators. Reference: Standard.
hP_bddAbove (Θ' : Set (Fin self.d → ℝ)) (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) : BddAbove (Set.range fun (θ : Fin self.d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin self.d → ℝ), sqDist (θhat Y) θ ∂self.P θ)
In the GSM, the supremum of risks is bounded above for each estimator. Reference: Standard property of Gaussian measures.
hMeasurable_θhat (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) : Measurable θhat
In the GSM, all estimators are measurable. This is an axiom that encapsulates the implicit assumption in statistics that estimators are measurable functions. Reference: Standard measure theory convention.

Instances For

source

noncomputable def Minimax.GaussianSequenceModel.noiseVariance (gsm : GaussianSequenceModel) :

ℝ

The noise variance per coordinate in the GSM: σ²/n.

Instances For

Estimators #

An estimator is a (measurable) function from the observation space ℝ^d to the parameter space ℝ^d. In the minimax framework, the infimum is taken over all such estimators.

source

def Minimax.Estimator (d : ℕ) :

Type

An estimator in the GSM: a function from observations Y ∈ ℝ^d to parameter estimates θ̂(Y) ∈ ℝ^d.

Instances For

source

instance Minimax.instNonemptyEstimator {d : ℕ} :

Nonempty (Estimator d)

Risk Functions #

source

noncomputable def Minimax.risk (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (θ : Fin gsm.d → ℝ) :

ℝ

The expected risk of an estimator θ̂ at parameter θ, under measure P_θ: R(θ̂, θ) = E_θ[‖θ̂(Y) - θ‖₂²].

Instances For

source

noncomputable def Minimax.supRisk (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) :

ℝ

The supremum (worst-case) risk of an estimator over the parameter set Θ: R(θ̂, Θ) = sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].

Instances For

source

noncomputable def Minimax.minimaxRisk (gsm : GaussianSequenceModel) :

ℝ

The minimax risk over the parameter set Θ: R*(Θ) = inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].

This is the best worst-case expected squared error achievable by any estimator. It characterizes the fundamental difficulty of the estimation problem.

Instances For

Definition 5.1: Minimax Optimality in Expectation #

An estimator θ̂_n is minimax optimal over Θ at rate φ(Θ) if:

(Upper bound, eq. 5.2) sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ
(Lower bound, eq. 5.4) inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖₂²] ≥ C' · φ

The rate φ = φ(Θ) is called the minimax rate of estimation over Θ. Note that minimax rates are defined up to multiplicative constants.

source

def Minimax.IsMinimaxOptimal_Expectation (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Definition 5.1. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the expectation sense if:

(Upper bound, eq. 5.2) There exists C > 0 such that sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ
(Lower bound, eq. 5.4) There exists C' > 0 such that inf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂(Y) - θ‖₂²] ≥ C' Equivalently: minimaxRisk(gsm) ≥ C' · φ.

The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.

Instances For

Definition 5.2: Minimax Optimality in High Probability #

This definition adapts minimax optimality to bounds that hold with high probability rather than in expectation. By the Markov inequality (eq. 5.5), a lower bound in expectation implies a lower bound in probability, so both definitions yield the same minimax rates.

source

noncomputable def Minimax.supProbLargeError (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

ℝ

For a given estimator θ̂ and threshold φ, the worst-case probability of a large squared error exceeding φ: sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).

Note: We use ≥ rather than > because the Section 5.2 reduction from estimation to testing (via the triangle inequality) naturally produces ≥ bounds. For Gaussian measures these are equivalent (atoms have zero probability), but ≥ makes the proof go through in the abstract framework. The resulting lower bound is also stronger (P(f ≥ c) ≥ P(f > c)).

Instances For

source

noncomputable def Minimax.minimaxProbLargeError (gsm : GaussianSequenceModel) (φ : ℝ) :

ℝ

The minimax probability of large deviation at threshold φ: inf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).

This quantifies the unavoidable probability that any estimator incurs squared error at least φ for some parameter θ ∈ Θ.

Instances For

source

def Minimax.upperBound_expectation (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Upper bound in expectation (eq. 5.2): sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ.

Instances For

source

def Minimax.upperBound_highProb (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Upper bound with high probability (eq. 5.3): for all θ ∈ Θ, P_θ(‖θ̂(Y) - θ‖₂² ≥ C · φ) ≤ d⁻², equivalently ‖θ̂(Y) - θ‖₂² < C · φ with probability at least 1 - d⁻².

Instances For

source

def Minimax.IsMinimaxOptimal_HighProb (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Definition 5.2. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the high-probability sense if:

(Upper bound) Either eq. (5.2) or (5.3) holds.
(Lower bound, eq. 5.6) There exists C' > 0 such that inf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ) ≥ C'.

The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.

Instances For

Aliases for Definition 5.1 and 5.2 #

source

def Minimax.definition_5_1 (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Definition 5.1 (Minimax Optimality in Expectation). An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) if:

(Upper bound, eq. 5.2) sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖²] ≤ C · φ
(Lower bound, eq. 5.4) inf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂ - θ‖²] ≥ C' This is an alias for IsMinimaxOptimal_Expectation.

Instances For

source

def Minimax.minimaxOptimalRate (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Definition 5.1 — The minimax optimal rate of estimation over Θ. A rate φ(Θ) is called the minimax rate if it satisfies the upper and lower bound conditions of Definition 5.1. This is an alias for IsMinimaxOptimal_Expectation.

Instances For

source

def Minimax.definition_5_2 (gsm : GaussianSequenceModel) (θhat : Estimator gsm.d) (φ : ℝ) :

Prop

Definition 5.2 (Minimax Optimality in High Probability). This is an alias for IsMinimaxOptimal_HighProb.

Instances For

Documentation