Squared Euclidean Distance #
We define the squared ℓ₂ distance between vectors in Fin d → ℝ,
which serves as the loss function throughout Chapter 5.
Gaussian Sequence Model (GSM, equation 5.1) #
The Gaussian Sequence Model (5.1): Y_i = θ*_i + ε_i, i = 1, …, d where ε = (ε₁, …, ε_d)ᵀ ~ N_d(0, (σ²/n) I_d), θ* ∈ Θ ⊂ ℝ^d.
The observation space is Fin d → ℝ (i.e., ℝ^d). For each true parameter
θ* ∈ Θ, we denote by P_θ the probability distribution of Y, and by E_θ
the corresponding expectation.
The Gaussian Sequence Model (5.1).
Parameters:
d: dimension of the parameter/observation spacen: sample size (appears in variance σ²/n)σ: noise standard deviation (σ > 0)Θ: the parameter set, a subset of ℝ^dP: the family of probability measures {P_θ}_{θ ∈ ℝ^d} on the observation space ℝ^d. For each θ, P_θ is the distribution of Y = θ + ε where ε ~ N_d(0, (σ²/n) I_d).
- d : ℕ
Dimension of the parameter space
- n : ℕ
Sample size
- σ : ℝ
Noise standard deviation
Positivity of σ
Positivity of n
The parameter set Θ ⊂ ℝ^d
For each θ ∈ ℝ^d, the probability measure P_θ on the observation space. In the GSM, P_θ is the distribution N_d(θ, (σ²/n) · I_d).
- hP_prob (θ : Fin self.d → ℝ) : MeasureTheory.IsProbabilityMeasure (self.P θ)
Each P_θ is a probability measure (inherent to the Gaussian model).
Gaussian measures with the same covariance are mutually absolutely continuous.
- hP_kl_toReal (θ₀ θ₁ : Fin self.d → ℝ) : (InformationTheory.klDiv (self.P θ₁) (self.P θ₀)).toReal = ↑self.n * sqDist θ₁ θ₀ / (2 * self.σ ^ 2)
KL divergence formula for the Gaussian sequence model (Example 5.7): KL(P_{θ₁} ‖ P_{θ₀}) = n · ‖θ₁ − θ₀‖₂² / (2σ²). This is a defining property of the Gaussian distribution.
KL divergence between Gaussian measures with the same covariance is finite.
- hP_integrable (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) : MeasureTheory.Integrable (fun (Y : Fin self.d → ℝ) => sqDist (θhat Y) θ) (self.P θ)
In the GSM,
sqDist (θhat Y) θis integrable under P_θ for any estimator. This is a standard property of Gaussian measures (polynomial functions of Gaussians are integrable). Reference: Standard measure theory, not proved in the textbook. - hP_aestronglyMeasurable (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) : MeasureTheory.AEStronglyMeasurable (fun (Y : Fin self.d → ℝ) => sqDist (θhat Y) θ) (self.P θ)
In the GSM,
sqDist (θhat Y) θis AEStronglyMeasurable under P_θ for any estimator. Reference: Standard measure theory. - hP_measurableSet_sqDist_ge (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) (θ : Fin self.d → ℝ) (c : ℝ) : MeasurableSet {Y : Fin self.d → ℝ | sqDist (θhat Y) θ ≥ c}
In the GSM, sets of the form {Y | sqDist (θhat Y) θ ≥ c} are measurable for any estimator θhat, parameter θ, and threshold c. This follows from measurability of the composition sqDist ∘ (θhat, θ), which holds for Borel-measurable estimators. Reference: Standard.
- hP_bddAbove (Θ' : Set (Fin self.d → ℝ)) (θhat : (Fin self.d → ℝ) → Fin self.d → ℝ) : BddAbove (Set.range fun (θ : Fin self.d → ℝ) => ⨆ (_ : θ ∈ Θ'), ∫ (Y : Fin self.d → ℝ), sqDist (θhat Y) θ ∂self.P θ)
In the GSM, the supremum of risks is bounded above for each estimator. Reference: Standard property of Gaussian measures.
In the GSM, all estimators are measurable. This is an axiom that encapsulates the implicit assumption in statistics that estimators are measurable functions. Reference: Standard measure theory convention.
Instances For
The noise variance per coordinate in the GSM: σ²/n.
Instances For
Estimators #
An estimator is a (measurable) function from the observation space ℝ^d to the parameter space ℝ^d. In the minimax framework, the infimum is taken over all such estimators.
An estimator in the GSM: a function from observations Y ∈ ℝ^d to parameter estimates θ̂(Y) ∈ ℝ^d.
Instances For
Risk Functions #
The expected risk of an estimator θ̂ at parameter θ, under measure P_θ:
R(θ̂, θ) = E_θ[‖θ̂(Y) - θ‖₂²].
Instances For
The supremum (worst-case) risk of an estimator over the parameter set Θ:
R(θ̂, Θ) = sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].
Instances For
The minimax risk over the parameter set Θ:
R*(Θ) = inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²].
This is the best worst-case expected squared error achievable by any estimator. It characterizes the fundamental difficulty of the estimation problem.
Instances For
Definition 5.1: Minimax Optimality in Expectation #
An estimator θ̂_n is minimax optimal over Θ at rate φ(Θ) if:
- (Upper bound, eq. 5.2)
sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ - (Lower bound, eq. 5.4)
inf_{θ̂} sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖₂²] ≥ C' · φ
The rate φ = φ(Θ) is called the minimax rate of estimation over Θ. Note that minimax rates are defined up to multiplicative constants.
Definition 5.1. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the expectation sense if:
(Upper bound, eq. 5.2) There exists
C > 0such thatsup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ(Lower bound, eq. 5.4) There exists
C' > 0such thatinf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂(Y) - θ‖₂²] ≥ C'Equivalently:minimaxRisk(gsm) ≥ C' · φ.
The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.
Instances For
Definition 5.2: Minimax Optimality in High Probability #
This definition adapts minimax optimality to bounds that hold with high probability rather than in expectation. By the Markov inequality (eq. 5.5), a lower bound in expectation implies a lower bound in probability, so both definitions yield the same minimax rates.
For a given estimator θ̂ and threshold φ, the worst-case probability of
a large squared error exceeding φ:
sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).
Note: We use ≥ rather than > because the Section 5.2 reduction
from estimation to testing (via the triangle inequality) naturally
produces ≥ bounds. For Gaussian measures these are equivalent
(atoms have zero probability), but ≥ makes the proof go through
in the abstract framework. The resulting lower bound is also
stronger (P(f ≥ c) ≥ P(f > c)).
Instances For
The minimax probability of large deviation at threshold φ:
inf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ).
This quantifies the unavoidable probability that any estimator incurs squared error at least φ for some parameter θ ∈ Θ.
Instances For
Upper bound in expectation (eq. 5.2):
sup_{θ ∈ Θ} E_θ[‖θ̂(Y) - θ‖₂²] ≤ C · φ.
Instances For
Upper bound with high probability (eq. 5.3):
for all θ ∈ Θ, P_θ(‖θ̂(Y) - θ‖₂² ≥ C · φ) ≤ d⁻²,
equivalently ‖θ̂(Y) - θ‖₂² < C · φ with probability at least 1 - d⁻².
Instances For
Definition 5.2. An estimator θ̂ is minimax optimal over Θ at rate φ(Θ) in the high-probability sense if:
- (Upper bound) Either eq. (5.2) or (5.3) holds.
- (Lower bound, eq. 5.6) There exists
C' > 0such thatinf_{θ̂} sup_{θ ∈ Θ} P_θ(‖θ̂(Y) - θ‖₂² ≥ φ) ≥ C'.
The rate φ = φ(Θ) is called the minimax rate of estimation over Θ.
Instances For
Aliases for Definition 5.1 and 5.2 #
Definition 5.1 (Minimax Optimality in Expectation). An estimator θ̂ is minimax optimal
over Θ at rate φ(Θ) if:
- (Upper bound, eq. 5.2)
sup_{θ ∈ Θ} E_θ[‖θ̂ - θ‖²] ≤ C · φ - (Lower bound, eq. 5.4)
inf_{θ̂} sup_{θ ∈ Θ} E_θ[φ⁻¹ · ‖θ̂ - θ‖²] ≥ C'This is an alias forIsMinimaxOptimal_Expectation.
Instances For
Definition 5.1 — The minimax optimal rate of estimation over Θ.
A rate φ(Θ) is called the minimax rate if it satisfies the upper and
lower bound conditions of Definition 5.1. This is an alias for
IsMinimaxOptimal_Expectation.
Instances For
Definition 5.2 (Minimax Optimality in High Probability).
This is an alias for IsMinimaxOptimal_HighProb.