Total variation distance between two finite measures. TV(P₀, P₁) = sup_{A measurable} |P₀(A) − P₁(A)|
Instances For
For probability measures, any set has measure ≤ 1 in real terms.
The set defining TV distance is bounded above for probability measures.
TV distance is ≥ |P₀(S) - P₁(S)| for any measurable S.
Lemma 5.3 (Neyman-Pearson lower bound). For any measurable test ψ : Ω → Bool, the sum of type I and type II errors satisfies P₀(ψ=1) + P₁(ψ=0) ≥ 1 − TV(P₀,P₁).
The Likelihood Ratio test ψ*(ω) = 𝕀(p₁(ω) ≥ p₀(ω)), where p₀, p₁ are
Radon-Nikodym derivatives of P₀, P₁ with respect to a common dominating
measure ν. Returns true when p₁(ω) ≥ p₀(ω), i.e., when the evidence
favors P₁ over P₀.
Instances For
The Likelihood Ratio test is measurable.
Helper: for any measurable T, P₀(T) − P₁(T) ≤ P₀(R) − P₁(R) where R = {ω | P₁.rnDeriv ν ω < P₀.rnDeriv ν ω}. Uses the integral splitting argument: on R, p₀ − p₁ ≥ 0, and on Rᶜ, p₀ − p₁ ≤ 0.
Lemma 5.3 (Neyman-Pearson, equality part, book lines 3319-3321). Moreover, equality holds for the Likelihood Ratio test ψ* = 𝕀(p₁ ≥ p₀): P₀(ψ*=1) + P₁(ψ*=0) = 1 − TV(P₀, P₁).
Here ν is a common dominating measure for P₀ and P₁ (e.g. ν = P₀ + P₁), and the LR test is defined using Radon-Nikodym derivatives dP₀/dν and dP₁/dν. The proof follows the book: on {p₁ ≥ p₀}, p₀ = min(p₀, p₁), and on {p₁ < p₀}, p₁ = min(p₀, p₁), so the sum telescopes to ∫ min(p₀, p₁) = 1 − TV(P₀, P₁).
Real-valued KL divergence. Defined as (klDiv P Q).toReal, where
klDiv is Mathlib's InformationTheory.klDiv taking values in ℝ≥0∞.
For probability measures P absolutely continuous w.r.t. Q with integrable
log-likelihood ratio, this equals ∫ log(dP/dQ) dP. When P is not absolutely
continuous w.r.t. Q, klDiv P Q = ⊤ and toReal returns 0.
Instances For
KL divergence (real-valued) is nonneg, since it is the toReal of an
ENNReal value.
The TVNP.tvDist and InfoTheory.tvDist are definitionally equal.
The TVNP.klDiv_real and InfoTheory.klDiv_real are definitionally equal.
TV distance is nonneg for probability measures.
TV distance is ≤ 1 for probability measures.
Lemma 5.8 (Pinsker's inequality). TV(P,Q) ≤ √(KL(P‖Q)).
This is stated in terms of our real-valued KL divergence definition.
The hypothesis hKL ensures KL(P‖Q) is finite (i.e. not ⊤), which is
needed for klDiv_real (defined via ENNReal.toReal) to be meaningful;
without it, toReal maps ⊤ to 0, making the inequality vacuously weak.
The proof follows the book (Rigollet, Lemma 5.8, Steps 1–3): Step 1: KL ≥ 2 − 2·BC (log inequality) Step 2: BC² ≤ 1 − TV² (Cauchy-Schwarz) Step 3: 2 − 2√(1−TV²) ≥ TV² (pure real analysis) Combining: KL ≥ 2 − 2√(1−TV²) ≥ TV², hence TV ≤ √KL.
Lemma 5.3 (Neyman-Pearson, bundled). The likelihood ratio test is optimal for simple hypothesis testing: (a) For any measurable test ψ, P₀(ψ=1) + P₁(ψ=0) ≥ 1 − TV(P₀,P₁). (b) Equality holds for the likelihood ratio test ψ⋆ = 𝕀(p₁ ≥ p₀): P₀(ψ⋆=1) + P₁(ψ⋆=0) = 1 − TV(P₀,P₁).