The Laplace's principle of insufficient reasoning, calls for assuming uniformity unless there is additional information.
Entropy maximization is the equivalent of minimizing the KL-Divergence between the distribution \(p\) and the uniform distribution.
More precisely, let \(\mathcal{X} \in \{\alpha_1, \cdots, \alpha_n\}\) be a random variable with finite alphabet, given a family of distribution \(\mathcal{P}\) and the uniform distribution \(u\),
\[ \underset{p\in\mathcal{P}}{\text{arg min}} \;D_{KL}(p\| u) = \underset{p\in\mathcal{P}}{\text{arg max}} \;H(p)\] where \(H(p)\) is the entropy.
Proof:
\[ D(p\|u) = \sum_i p_i \log p_i + (\sum_i p_i) \log (n) = \log (n) - H(p) \]
Note:
This is true in general for random variables with finite support. For RVs with infinite support, additional constraint is required. See Gibbs measure...
No comments:
Post a Comment