Consider a physical system with many degrees of freedom, that can reside in any one of a large number of possible states. Let
p_i denote the probability of occurrence of state
i, for example, with the following properties:
p_i\ge 0 \quad \text{for all}\; iand
\sum_i p_i = 1 Let E_i denote the energy of the system when it is in state i. A fundamental results from statistical mechanics tells us that when the system is in thermal equilibrium with its surrounding environment, statie i occurs with a probability define by
p_i = \frac{1}{Z}\exp(-\frac{E_i}{k_BT}) where T is the absolute temperature in kelvins, k_B is the Boltzmann's constant, and Z is a constant that is independent of all states. The partition function Z is the normalizing constant with
Z= \sum_i \exp(-\frac{E_i}{k_BT}). The probability distribution is called the Gibbs distribution.
Two interesting properties of the Gibbs distribution are:
- States of low energy have a higher probability of occurrence than states of high energy.
- As the temperature T is reduced, the probability is concentrated on a smaller subset of low-energy states.
In the context of neural networks, the parameter T may be viewed as a pseudo-temperature that controls thermal fluctualtions representing the effect of "synaptic noise" in a neuron. Its precise scale is irrelevant. We can redefine the probability p_i and partition function Z as
p_i = \frac{1}{Z}\exp(-\frac{E_i}{T}) and
Z = \sum_i \exp(-\frac{E_i}{T}) where T is referred to simply as the temperature of the system.
Note that -\log p_i may be viewed as a form of "energy" measured at unit temperature.
Free Energy and Entropy
The Helmholtz
free energy of a physical system, denoted by
F, is defined in terms of the partition function
Z as follows
F = -T \log Z. The average energy of the system is defined by
\lt E\gt = \sum_i p_i E_i The difference between the average energy and free energy is
\lt E \gt - F = -T\sum_i p_i \log p_i which we can rewrite in terms of entropy
H
\lt E \gt - F = T H or, equivalently,
F = \lt E \gt - TH. The entropy of any systems tend to increase until it reaches an equilibrium, and therefore the free energy of the system will reach a minimum.
This is an important principle called the
principle of minimal free energy.