The cross entropy of distributions \(p\) and \(q\) is
\[H(p,q)=E_p[-\log q]\] It can be viewed as
\[H(p,q)=H(p)+D_{KL}(p\|q)\] where \(H(p)\) is the entropy of \(p\) and \(D_{KL}(p\|q)\) is the non-negative Kullback–Leibler divergence.
From an source coding perspective, it is the total bits required to encode information if the estimated distributed \(q\) diverged from the true distribution \(p\), where \(H(p)\) is the minimum.
This quantity is very useful in machine learning. Viewed from a vector quantization point of view, logistic regression is a way of finding an optimal boundary to classifying samples in a (possibly high dimensional) space of interest. This expression quantifies the loss of estimating distribution \(q\) instead of the true distribution \(p\). Since \(H(p)\) is fixed because it is a property of the underlying true distribution, minimizing cross-entropy is equivalent to minimizing KL divergence in this setting.
No comments:
Post a Comment