Helper lemmas for product distribution calculations #
Cross-marginal: E[f(sel l₁) * g(sel l₂)] = E[f] * E[g] for l₁ ≠ l₂.
E[(∑_l a(sel l))²] = k * E[a²] + k(k-1) * (E[a])².
Weighted averaging existence lemma #
Core probabilistic tool: averaging over product space #
Averaging principle for the product distribution (probabilistic method core). Proved via the bias-variance identity for finite product measures.
Fiberwise sum decomposition for Option-valued selections #
Maurey's sparse approximation #
Maurey's sparse approximation (existence via probabilistic method).
For any θ with ℓ₁ norm ≤ R, ∃ a 2k-sparse θ' with MSE(Φθ') ≤ MSE(Φθ) + D²R²/k.
The proof uses the probabilistic method: construct a discrete distribution on column indices, draw k independent samples, and average the corresponding atoms. The bias-variance identity for the product distribution gives the MSE bound. The existence of a good realization follows from the averaging principle.
Theorem 3.6 (Maurey's empirical method).
Let {φ₁,...,φ_M} be a dictionary with max_{j} |φ_j|₂ ≤ D√n. Then for any k ≥ 1 and R > 0: min_{|θ|₀≤2k} MSE(φ_θ) ≤ min_{|θ|₁≤R} MSE(φ_θ) + D²R²/k