Dave's Journal: Properties of the exponential family of distributions

From Dasgupta (see link)

One parameter Exponential family

Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R} \} \), the pdf of which has the form

\[ f(x|\theta) = h(x) e^{\eta(\theta) T(x) - \psi^*(\theta)} \]

If \(\eta(\theta)\) is a 1-1 function of \(\theta\) we can drop \(\theta\) in the discussion. Thus the family of distributions \(\{ P_\eta, \eta\in \Xi \subset \mathbb{R} \} \) is in canonical form.

\[ f(x|\theta) = h(x) e^{\eta T(x) - \psi(\eta)} \] and define the set

\[ \mathcal{T} = \{ \eta : e^{\psi(\eta)} < \infty \}\]

\(\eta\) is the natural parameter, and \(\mathcal{T}\) the natural parameter space.

The family is called the canonical one parameter Exponential family.

[Brown] The family is called full if \(\Xi = \mathcal{T}\), regular if \(\mathcal{T}\) is open.

[Brown] Let K be the convex support of the measure \(\nu\)

The family is minimal if \(\dim \Xi = \dim K = k\)

It is nonsingular if \( Var_\eta (T(X)) > 0 \) for all \(\eta \in \overset{\circ}{\mathcal{T}}\), the interior of \(\mathcal{T}\).

Theorem 1. \(\psi(\eta)\) is a convex function on \(\mathcal{T}\).

Theorem 2. \(\psi(\eta)\) is a cumulant generating function for any \( \eta \in \overset{\circ}{\mathcal{T}}\).

Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.

There are more properties...

Multi-parameter Exponential family

Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R}^k \} \), the pdf of which has the form

\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i(\theta) T_i(x) - \psi^*(\theta)} \] is the k-parameter Exponential family.

Where we reparametrize using \(\eta_i = \eta_i(\theta)\), we have the k-parameter canonical family.

The assumption here is that the dimension of \(\Theta\) and dimension of the image of \(\Theta\) under the map \( (\theta) \rightarrow (\eta_1(\theta),\dotsc,\eta_k(\theta) )\) are equal to \(k\).

The canonical form is

\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i T_i(x) - \psi(\eta)} \]

Theorem 7. Given a sample having a distribution \(P_\eta, \eta\in\mathcal{T}\) in the canonical k-parameter Exponential family. with \( \mathcal{T} = \{ \eta \in \mathbb{R}^k : e^{\psi(\eta)} < \infty \} \)

\(\psi(\eta))\) the partial derivatives of any order exists for any \(\eta \in \overset{\circ}{\mathcal{T}}\)

Definition. The family is full rank if at every \(\eta \in \overset{\circ}{\mathcal{T}}\) the covariance matrix \[ I(\eta) = \frac{\partial^2}{\partial \eta_i \partial \eta_j} \psi(\eta) \ge 0 \] is nonsingular.

Definition/Theorem. If the family is nonsingular, then the matrix \(I(\eta)\) is called the Fisher information matrix at \(\eta\) (for the natural parameter).

Proof. For canonical exponential family, we have \(L(x,\eta) = \log p_\eta(x) \doteq \langle \eta, T(x) \rangle - \psi(\eta) \), \(L'(x;\eta) = T(x) - \frac{\partial }{\partial \eta} \psi(\eta) \) and \( L''(x;\eta) = - \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\) is constant for fixed \(\eta\), so

\[ I(\eta) = \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\]

Sufficiency and Completeness

Theorem 8. Suppose a family of distribution \(\mathcal{F} = \{ P_\theta, \theta \in \Theta\} \) belongs to a k-parameter Exponential family and that the "true" parameter space \(\Theta\) has a nonempty interior, then the family \(\mathcal{F}\) is complete.

Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family \(\mathcal{F}\), with a parameter space \(\Theta\) that has a nonempty interior, the natural sufficient statistic of the family \(T(X)\) and any ancillary statistic \(S(X)\) are independently distributed under each \(\theta \in \Theta\).

MLE of exponential family

Recall, \(L(x,\theta) = \log p_\theta(x) \doteq \langle \theta, T(x) \rangle - \psi(\theta) \). The solution of the MLE satisfies

\[ S(\theta) = \left. \frac{\partial}{\partial \theta} L(x;\theta) \right\vert_{\theta = \theta_{ML}}= 0 \; \Longleftrightarrow \; T(x) = E_{\theta_{ML}} [ T(X) ] \] where \( \frac{\partial}{\partial \theta} \psi(\theta) = E_\theta [ T(X) ] \)

The second derivative gives us

\[ \frac{\partial^2}{ \partial \theta \partial \theta^T} L(x;\theta) = - I(\theta) = - Cov_\theta [ T(X) ] \] The right hand side is negative definite for full rank family. Therefore the log likelihood function is strictly concave in \(\theta\).

Existence of conjugate prior

For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family. The marginalization to \(p(x) = \int p(x|\theta) p(\theta) d\theta \) is also tractable.

From Casella-Berger.

Note that the parameter space is the "natural" parameter space.

Dave's Journal

MathJax

Monday, September 12, 2016

Properties of the exponential family of distributions

No comments: