\[p_i\ge 0 \quad \text{for all}\; i\]and
\[\sum_i p_i = 1\] Let \(E_i\) denote the energy of the system when it is in state \(i\). A fundamental results from statistical mechanics tells us that when the system is in thermal equilibrium with its surrounding environment, statie \(i\) occurs with a probability define by
\[p_i = \frac{1}{Z}\exp(-\frac{E_i}{k_BT})\] where \(T\) is the absolute temperature in kelvins, \(k_B\) is the Boltzmann's constant, and \(Z\) is a constant that is independent of all states. The partition function \(Z\) is the normalizing constant with
\[Z= \sum_i \exp(-\frac{E_i}{k_BT}).\] The probability distribution is called the Gibbs distribution.
Two interesting properties of the Gibbs distribution are:
- States of low energy have a higher probability of occurrence than states of high energy.
- As the temperature \(T\) is reduced, the probability is concentrated on a smaller subset of low-energy states.
In the context of neural networks, the parameter \(T\) may be viewed as a pseudo-temperature that controls thermal fluctualtions representing the effect of "synaptic noise" in a neuron. Its precise scale is irrelevant. We can redefine the probability \(p_i\) and partition function Z as
\[p_i = \frac{1}{Z}\exp(-\frac{E_i}{T}) \] and
\[Z = \sum_i \exp(-\frac{E_i}{T})\] where \(T\) is referred to simply as the temperature of the system.
Note that \(-\log p_i\) may be viewed as a form of "energy" measured at unit temperature.
The Helmholtz free energy of a physical system, denoted by \(F\), is defined in terms of the partition function \(Z\) as follows
\[F = -T \log Z.\] The average energy of the system is defined by
\[\lt E\gt = \sum_i p_i E_i\] The difference between the average energy and free energy is
\[\lt E \gt - F = -T\sum_i p_i \log p_i\] which we can rewrite in terms of entropy \(H\)
\[\lt E \gt - F = T H\] or, equivalently,
\[F = \lt E \gt - TH\]. The entropy of any systems tend to increase until it reaches an equilibrium, and therefore the free energy of the system will reach a minimum.
This is an important principle called the principle of minimal free energy.
No comments:
Post a Comment