The cross entropy of distributions p and q is
H(p,q)=Ep[−logq] It can be viewed as
H(p,q)=H(p)+DKL(p‖q) where H(p) is the entropy of p and DKL(p‖q) is the non-negative Kullback–Leibler divergence.
From an source coding perspective, it is the total bits required to encode information if the estimated distributed q diverged from the true distribution p, where H(p) is the minimum.
This quantity is very useful in machine learning. Viewed from a vector quantization point of view, logistic regression is a way of finding an optimal boundary to classifying samples in a (possibly high dimensional) space of interest. This expression quantifies the loss of estimating distribution q instead of the true distribution p. Since H(p) is fixed because it is a property of the underlying true distribution, minimizing cross-entropy is equivalent to minimizing KL divergence in this setting.
No comments:
Post a Comment