Atlas.HighDimensionalStatistics.code.Chapter3.Thm_3

Helper lemmas for product distribution calculations #

theorem Chapter3.prod_dist_factorize {S : Type} [Fintype S] [DecidableEq S] {k : ℕ} (h : Fin k → S → ℝ) :

∑ sel : Fin k → S, ∏ l : Fin k, h l (sel l) = ∏ l : Fin k, ∑ j : S, h l j

Product distribution factorization: ∑_sel ∏_l h(l, sel l) = ∏_l ∑_j h(l, j).

theorem Chapter3.product_weight_sum {S : Type} [Fintype S] [DecidableEq S] {k : ℕ} (w : S → ℝ) (hw_sum : ∑ j : S, w j = 1) :

∑ sel : Fin k → S, ∏ l : Fin k, w (sel l) = 1

Product weights sum to 1.

source

theorem Chapter3.product_marginal {S : Type} [Fintype S] [DecidableEq S] {k : ℕ} (w : S → ℝ) (hw_sum : ∑ j : S, w j = 1) (f : S → ℝ) (l₀ : Fin k) :

∑ sel : Fin k → S, (∏ l : Fin k, w (sel l)) * f (sel l₀) = ∑ j : S, w j * f j

Marginal of the product distribution: E[f(sel l₀)] = ∑ w j * f j.

source

theorem Chapter3.product_cross_marginal {S : Type} [Fintype S] [DecidableEq S] {k : ℕ} (w : S → ℝ) (hw_sum : ∑ j : S, w j = 1) (f g : S → ℝ) (l₁ l₂ : Fin k) (hne : l₁ ≠ l₂) :

∑ sel : Fin k → S, (∏ l : Fin k, w (sel l)) * (f (sel l₁) * g (sel l₂)) = (∑ j : S, w j * f j) * ∑ j : S, w j * g j

Cross-marginal: E[f(sel l₁) * g(sel l₂)] = E[f] * E[g] for l₁ ≠ l₂.

source

theorem Chapter3.expected_sum_sq {S : Type} [Fintype S] [DecidableEq S] {k : ℕ} (hk : 1 ≤ k) (w : S → ℝ) (hw_sum : ∑ j : S, w j = 1) (a : S → ℝ) :

∑ sel : Fin k → S, (∏ l : Fin k, w (sel l)) * (∑ l : Fin k, a (sel l)) ^ 2 = ↑k * ∑ j : S, w j * a j ^ 2 + ↑k * (↑k - 1) * (∑ j : S, w j * a j) ^ 2

E[(∑_l a(sel l))²] = k * E[a²] + k(k-1) * (E[a])².

Weighted averaging existence lemma #

source

theorem Chapter3.exists_le_of_weighted_sum {ι : Type u_1} [Fintype ι] (w f : ι → ℝ) (c : ℝ) (hw_nn : ∀ (i : ι), 0 ≤ w i) (hw_pos : 0 < ∑ i : ι, w i) (havg : ∑ i : ι, w i * f i ≤ (∑ i : ι, w i) * c) :

∃ (i : ι), f i ≤ c

Core probabilistic tool: averaging over product space #

source

theorem Chapter3.product_averaging_existence {n : ℕ} (S : Type) [Fintype S] [Nonempty S] (w : S → ℝ) (hw_nn : ∀ (j : S), 0 ≤ w j) (hw_sum : ∑ j : S, w j = 1) (u : S → Fin n → ℝ) (g : Fin n → ℝ) (B : ℝ) (hB : ∀ (j : S), ∑ i : Fin n, u j i ^ 2 ≤ B) (k : ℕ) (hk : 1 ≤ k) :

have μ := fun (i : Fin n) => ∑ j : S, w j * u j i; ∃ (sel : Fin k → S), ∑ i : Fin n, (g i - 1 / ↑k * ∑ l : Fin k, u (sel l) i) ^ 2 ≤ ∑ i : Fin n, (g i - μ i) ^ 2 + B / ↑k

Averaging principle for the product distribution (probabilistic method core). Proved via the bias-variance identity for finite product measures.

Fiberwise sum decomposition for Option-valued selections #

source

theorem Chapter3.sum_sel_eq_sum_card_mul {S : Type u_1} [Fintype S] [DecidableEq S] {k : ℕ} (sel : Fin k → S) (f : S → ℝ) :

∑ l : Fin k, f (sel l) = ∑ j : S, ↑{l : Fin k | sel l = j}.card * f j

Decompose a sum over Fin k by grouping according to the values of sel.

Maurey's sparse approximation #

source

theorem Chapter3.maurey_sparse_approx {n M : ℕ} (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (D : ℝ) (hD : 0 < D) (hNorm : ∀ (j : Fin M), ∑ i : Fin n, Φ i j ^ 2 ≤ D ^ 2 * ↑n) (k : ℕ) (hk : 1 ≤ k) (R : ℝ) (hR : 0 < R) (θ : Fin M → ℝ) (hθ : l1norm θ ≤ R) :

∃ (θ' : Fin M → ℝ), support_size θ' ≤ 2 * k ∧ MSE (Φ.mulVec θ') f ≤ MSE (Φ.mulVec θ) f + D ^ 2 * R ^ 2 / ↑k

Maurey's sparse approximation (existence via probabilistic method).

For any θ with ℓ₁ norm ≤ R, ∃ a 2k-sparse θ' with MSE(Φθ') ≤ MSE(Φθ) + D²R²/k.

The proof uses the probabilistic method: construct a discrete distribution on column indices, draw k independent samples, and average the corresponding atoms. The bias-variance identity for the product distribution gives the MSE bound. The existence of a good realization follows from the averaging principle.

source

theorem Chapter3.theorem_3_6_maurey {n M : ℕ} (hn : 0 < n) (hM : 0 < M) (Φ : Matrix (Fin n) (Fin M) ℝ) (f : Fin n → ℝ) (D : ℝ) (hD : 0 < D) (hNorm : ∀ (j : Fin M), ∑ i : Fin n, Φ i j ^ 2 ≤ D ^ 2 * ↑n) (k : ℕ) (hk : 1 ≤ k) (R : ℝ) (hR : 0 < R) :

⨅ (θ : { θ : Fin M → ℝ // support_size θ ≤ 2 * k }), MSE (Φ.mulVec ↑θ) f ≤ (⨅ (θ : { θ : Fin M → ℝ // l1norm θ ≤ R }), MSE (Φ.mulVec ↑θ) f) + D ^ 2 * R ^ 2 / ↑k

Theorem 3.6 (Maurey's empirical method).

Let {φ₁,...,φ_M} be a dictionary with max_{j} |φ_j|₂ ≤ D√n. Then for any k ≥ 1 and R > 0: min_{|θ|₀≤2k} MSE(φ_θ) ≤ min_{|θ|₁≤R} MSE(φ_θ) + D²R²/k

Documentation