Definition 10.1.6 (Conditional entropy). For a joint distribution $p$ on $S \times T$ with marginal $p_T$, $H(X \mid Y) = \sum_t p_T(t) \sum_s \text{negMulLog}(p(s, t) / p_T(t))$.
Instances For
Total-probability identity used in the conditional-entropy bounds: $\sum_t p_T(t) \cdot \frac{p(s, t)}{p_T(t)} = p_S(s)$.
Lemma 10.1.5 (entropy of independent random variables): If $X$ and $Y$ are independent, $H(X, Y) = H(X) + H(Y)$.
Lemma 10.1.7 (chain rule for entropy): $H(X, Y) = H(Y) + H(X \mid Y)$.
Conditioning never increases entropy: $H(X \mid Y) \leq H(X)$, proved via Jensen's inequality applied to the concave function $-x \log x$.
Summing the $(Y, Z)$-marginal over $Y$ recovers the $Z$-marginal of the projected distribution onto $S \times U$.
Pointwise formula for the projection of a joint PMF on $S \times (T \times U)$ to $S \times U$: $q(s, u) = \sum_t p(s, (t, u))$.
Lemma 10.1.10 (dropping conditioning): $H(X \mid Y, Z) \leq H(X \mid Z)$, i.e., dropping a conditioning variable cannot decrease conditional entropy.
Lemma 10.1.8 (subadditivity of entropy): $H(X, Y) \leq H(X) + H(Y)$.
Shannon entropy is invariant under bijective relabeling of the sample space.
Under the equivalence $\alpha^{n+1} \simeq \alpha^n \times \alpha$, the second coordinate marginal corresponds to the marginal at the last index.
Under the equivalence $\alpha^{n+1} \simeq \alpha^n \times \alpha$, the $i$-th marginal of the first factor corresponds to the marginal at $\text{castSucc } i$.
General subadditivity of entropy (Lemma 10.1.8 extended): for any joint PMF on $\alpha^n$, $H(X_1, \dots, X_n) \leq \sum_i H(X_i)$.