One parameter Exponential family
Given the family of distribution {Pθ,θ∈Θ⊂R}, the pdf of which has the form
f(x|θ)=h(x)eη(θ)T(x)−ψ∗(θ)
If η(θ) is a 1-1 function of θ we can drop θ in the discussion. Thus the family of distributions {Pη,η∈Ξ⊂R} is in canonical form.
f(x|θ)=h(x)eηT(x)−ψ(η) and define the set
T={η:eψ(η)<∞}
η is the natural parameter, and T the natural parameter space.
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if Ξ=T, regular if T is open.
[Brown] Let K be the convex support of the measure ν
The family is minimal if dimΞ=dimK=k
It is nonsingular if Varη(T(X))>0 for all η∈∘T, the interior of T.
Theorem 1. ψ(η) is a convex function on T.
Theorem 2. ψ(η) is a cumulant generating function for any η∈∘T.
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...
Multi-parameter Exponential family
Given the family of distribution {Pθ,θ∈Θ⊂Rk}, the pdf of which has the form
f(x|θ)=h(x)e∑ki=1ηi(θ)Ti(x)−ψ∗(θ) is the k-parameter Exponential family.
Where we reparametrize using ηi=ηi(θ), we have the k-parameter canonical family.
The assumption here is that the dimension of
Θ and dimension of the image of
Θ under the map
(θ)→(η1(θ),…,ηk(θ)) are equal to
k.
The canonical form is
f(x|θ)=h(x)e∑ki=1ηiTi(x)−ψ(η)
Theorem 7. Given a sample having a distribution Pη,η∈T in the canonical k-parameter Exponential family. with T={η∈Rk:eψ(η)<∞}
ψ(η)) the partial derivatives of any order exists for any η∈∘T
Definition. The family is full rank if at every η∈∘T the covariance matrix I(η)=∂2∂ηi∂ηjψ(η)≥0 is nonsingular.
Definition/Theorem. If the family is nonsingular, then the matrix I(η) is called the Fisher information matrix at η (for the natural parameter).
Proof. For canonical exponential family, we have L(x,η)=logpη(x)≐⟨η,T(x)⟩−ψ(η), L′(x;η)=T(x)−∂∂ηψ(η) and L″ is constant for fixed \eta, so
I(\eta) = \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)
Sufficiency and Completeness
Theorem 8. Suppose a family of distribution \mathcal{F} = \{ P_\theta, \theta \in \Theta\} belongs to a k-parameter Exponential family and that the "true" parameter space \Theta has a nonempty interior, then the family \mathcal{F} is complete.
Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family \mathcal{F}, with a parameter space \Theta that has a nonempty interior, the natural sufficient statistic of the family T(X) and any ancillary statistic S(X) are independently distributed under each \theta \in \Theta.
MLE of exponential family
Recall, L(x,\theta) = \log p_\theta(x) \doteq \langle \theta, T(x) \rangle - \psi(\theta) . The solution of the MLE satisfies
S(\theta) = \left. \frac{\partial}{\partial \theta} L(x;\theta) \right\vert_{\theta = \theta_{ML}}= 0 \; \Longleftrightarrow \; T(x) = E_{\theta_{ML}} [ T(X) ] where \frac{\partial}{\partial \theta} \psi(\theta) = E_\theta [ T(X) ]
The second derivative gives us
\frac{\partial^2}{ \partial \theta \partial \theta^T} L(x;\theta) = - I(\theta) = - Cov_\theta [ T(X) ] The right hand side is negative definite for full rank family. Therefore the log likelihood function is strictly concave in \theta.
Existence of conjugate prior
For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family. The marginalization to p(x) = \int p(x|\theta) p(\theta) d\theta is also tractable.
From Casella-Berger.
Note that the parameter space is the "natural" parameter space.