Processing math: 57%

MathJax

Friday, September 16, 2016

Enabling copy/paste in vmware 12 player

1) sudo apt-get autoremove open-vm-tools
2) Install VMware Tools by following the usual method (Virtual Machine --> Reinstall VMWare Tools)
3) Reboot the VM
4) sudo apt-get install open-vm-tools-desktop
5) Reboot the VM, after the reboot copy/paste and drag/drop will work!

Monday, September 12, 2016

Properties of MMSE and MAP estimator (Bayesian)

The MMSE estimator is the mean of the posterior pdf E(x|y) of x given observation y.
  1. The estimator is unbiased.
  2. The covariance is reduced compared to the a priori information.
  3. Commutes over affine transformation.
  4. Additivity property for independent data sets.
  5. Linear in the Gaussian case.
  6. The estimator error is orthogonal to the space spanned by all Y-measurable functions (affine functions being a subset)
The MAP estimator arg maxθp(θ|x) given observation x
  1. Jointly Gaussian case, MAP = MMSE (posterior is Gaussian, hence pdf unimodal and symmetric, mean = mode = median)
  2. Do not commute over nonlinear transformation. (Invariant property does not hold, unlike ML)
  3. Commutes over linear transformation.
MAP tends to ML when
  • Prior is uninformative
  • Large amount of information in data compared to prior

Gaussian linear model

Let the observed samples takes on the model
x=Hθ+w with prior N(μθ,Cθ) and noise vector N(0,Cw) independent of θ, then the posterior is Gaussian with mean
E(θ|x)=μθ+CθHT(HCθHT+Cw)1(xHμθ) and covariance Cθ|x=CθCθHT(HCθHT+Cw)1HCθ Contrary to the classical Gaussian linear model H does not need to be full rank.  
In alternative form, 
E(θ|x)=μθ+(C1θ+HTC1wH)1HTC1w(xHμθ) and Cθ|x=(C1θ+HTC1wH)1 

LMMSE estimator E[X|Y]
  1. A function of first and second order statistics only.  E[X|Y]=μx+ΣxyΣ1yy(yμy) (inverse can be replaced with pseudo-inverse if necessary)
  2. Jointly Gaussian case, E[X|Y]=E[X|Y]
  3. Error orthogonal to subspace spanned by Y
  4. Additivity property E[X|Y1,,Yk]=kj=1E[X|Yj](k1)μx

Properties of the exponential family of distributions

From Dasgupta (see link)

One parameter Exponential family

Given the family of distribution {Pθ,θΘR}, the pdf of which has the form
f(x|θ)=h(x)eη(θ)T(x)ψ(θ) 
If η(θ) is a 1-1 function of  θ we can drop θ in the discussion.  Thus the family of distributions {Pη,ηΞR} is in canonical form.
f(x|θ)=h(x)eηT(x)ψ(η) and define the set 
T={η:eψ(η)<}
η is the natural parameter, and T the natural parameter space
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if Ξ=Tregular if T is open.
[Brown] Let K be the convex support of the measure ν
The family is minimal if dimΞ=dimK=k
It is nonsingular if Varη(T(X))>0 for all ηT, the interior of T.

Theorem 1. ψ(η) is a convex function on T.
Theorem 2. ψ(η) is a cumulant generating function for any ηT.
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...

Multi-parameter Exponential family

Given the family of distribution {Pθ,θΘRk}, the pdf of which has the form
f(x|θ)=h(x)eki=1ηi(θ)Ti(x)ψ(θ) is the k-parameter Exponential family.
Where we reparametrize using ηi=ηi(θ), we have the k-parameter canonical family.
The assumption here is that the dimension of Θ and dimension of the image of Θ under the map (θ)(η1(θ),,ηk(θ)) are equal to k.
The canonical form is 
f(x|θ)=h(x)eki=1ηiTi(x)ψ(η)

Theorem 7.  Given a sample having a distribution Pη,ηT in the canonical k-parameter Exponential family.  with  T={ηRk:eψ(η)<}
ψ(η)) the partial derivatives of any order exists  for any ηT  

Definition.  The family is full rank if at every ηT the covariance matrix I(η)=2ηiηjψ(η)0 is nonsingular.
Definition/Theorem.  If the family is nonsingular, then the matrix I(η) is called the Fisher information matrix at η (for the natural parameter).
Proof.  For canonical exponential family, we have L(x,η)=logpη(x)η,T(x)ψ(η), L(x;η)=T(x)ηψ(η) and L is constant for fixed \eta, so  
I(\eta) = \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)

Sufficiency and Completeness

Theorem 8.  Suppose a family of distribution \mathcal{F} = \{ P_\theta, \theta \in \Theta\} belongs to a k-parameter Exponential family and that the "true" parameter space \Theta has a nonempty interior, then the family \mathcal{F} is complete.

Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family \mathcal{F}, with a parameter space \Theta that has a nonempty interior, the natural sufficient statistic of the family T(X) and any ancillary statistic S(X) are independently distributed under each \theta \in \Theta.

MLE of exponential family

Recall, L(x,\theta) = \log p_\theta(x) \doteq \langle \theta, T(x) \rangle - \psi(\theta) .  The solution of the MLE satisfies 
S(\theta) = \left. \frac{\partial}{\partial \theta} L(x;\theta) \right\vert_{\theta = \theta_{ML}}= 0 \; \Longleftrightarrow  \; T(x) = E_{\theta_{ML}} [ T(X) ] where  \frac{\partial}{\partial \theta} \psi(\theta) =  E_\theta [ T(X) ]  

The second derivative gives us 
\frac{\partial^2}{ \partial \theta \partial \theta^T} L(x;\theta) = - I(\theta) = - Cov_\theta [ T(X) ]   The right hand side is negative definite for full rank family.  Therefore the log likelihood function is strictly concave in \theta.

Existence of conjugate prior

For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family.  The marginalization to p(x) = \int p(x|\theta) p(\theta) d\theta is also tractable.

From Casella-Berger.  

Note that the parameter space is the "natural" parameter space.



Tuesday, September 06, 2016

Local convergence for exponential mixture family

From Redner, Walker 1984

Theorem 5.2.  Suppose that the Fisher information matrix I(\Phi) is positive definite at the true parameter \Phi^* and that \Phi^* = (\alpha_1^*, \dotsc, \alpha_m^*, \phi_1^*, \dotsc, \phi_m^*) is such that \alpha_i^* > 0 \text{ for } i = 1,\dotsc,m.  For \Phi^{(0)} \in \Omega, denote by \{\Phi^{(j)}\}_{j=0,1,2,\dotsc} the sequence in \Omega generated by the EM iteration.  Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution \Phi^N = (\alpha_1^N, \dotsc, \alpha_m^N, \phi_1^N, \dotsc, \phi_m^N) of the likelihood equations is well defined and there is a certain norm on \Omega in which  \{\Phi^{(j)}\}_{j=0,1,2,\dotsc} converges linearly to \Phi^N whenever \Phi^{(0)} is sufficiently near \Phi^N, i.e. there is a constant 0 \leq \lambda < 1, for which
\lVert \Phi^{(j+1)} - \Phi^N \rVert \leq \lambda \lVert \Phi^{(j)} - \Phi^N \rVert, \quad j = 0,1,2,\dotsc  whenever \Phi^{(0)} is sufficiently near \Phi^{N}.

Differentiability of jump functions

Let
j_n(x) =  \left\lbrace \begin{matrix}{} 0 & \text{if } x < x_n, \\ \theta_n & \text{if } x = x_n, \\ 1 & \text{if } x > x_n , \end{matrix} \right. For some 0\leq \theta_n \leq 1,  then the jump function is defined as
J(x) = \sum_{n=1}^\infty \alpha_n j_n(x). with \sum_{n=1}^\infty \alpha_n < \infty.
Theorem.  If  J is the jump function, then J'(x) exists and vanishes almost everywhere.  (non-zero in a set of measure zero, E = \{x : J'(x)\neq 0, x\in \mathcal{B} \}, m(E) = 0 ).

Typical a probability distribution F is defined as a nondecreasing, right continuous function with F(-\infty) = 0,\; F(\infty)=1.