Processing math: 100%

MathJax

Thursday, February 12, 2015

Cross Entropy

The cross entropy of distributions p and q is
H(p,q)=Ep[logq] It can be viewed as
H(p,q)=H(p)+DKL(pq) where H(p) is the entropy of p and DKL(pq) is the non-negative Kullback–Leibler divergence.

From an source coding perspective, it is the total bits required to encode information if the estimated distributed q diverged from the true distribution p, where H(p) is the minimum.

This quantity is very useful in machine learning.  Viewed from a vector quantization point of view, logistic regression is a way of finding an optimal boundary to classifying samples in a (possibly high dimensional) space of interest.   This expression quantifies the loss of estimating distribution q instead of the true distribution p.  Since H(p) is fixed because it is a property of the underlying true distribution, minimizing cross-entropy is equivalent to minimizing KL divergence in this setting.

No comments: