MathJax

Monday, December 12, 2016

Definition of a distribution function

A function \(F: \mathbb{R} \mapsto [0,1]\) satisfying the following properties is a distribution function.

  1. \(F\) is right continous;
  2. \(F\) is monotone non-decreasing,
  3. \(F\) has limits at \(\pm\infty\)
    \begin{align*} F(\infty) &:= \lim_{x\uparrow \infty} F(x) = 1 \\ F(-\infty) &:=\lim_{x\downarrow - \infty} F(x) = 0 \end{align*}

Wednesday, October 19, 2016

VMWare disk activity reduction methods

See this forum link

and also vmware knowledge base

Also try disabling swap in Linux guest (sudo swapoff -a)

Friday, September 16, 2016

Enabling copy/paste in vmware 12 player

1) sudo apt-get autoremove open-vm-tools
2) Install VMware Tools by following the usual method (Virtual Machine --> Reinstall VMWare Tools)
3) Reboot the VM
4) sudo apt-get install open-vm-tools-desktop
5) Reboot the VM, after the reboot copy/paste and drag/drop will work!

Monday, September 12, 2016

Properties of MMSE and MAP estimator (Bayesian)

The MMSE estimator is the mean of the posterior pdf \(E(x|y)\) of \(x\) given observation \(y\).
  1. The estimator is unbiased.
  2. The covariance is reduced compared to the a priori information.
  3. Commutes over affine transformation.
  4. Additivity property for independent data sets.
  5. Linear in the Gaussian case.
  6. The estimator error is orthogonal to the space spanned by all Y-measurable functions (affine functions being a subset)
The MAP estimator \( \textsf{arg max}_\theta \; p(\theta|x) \) given observation \(x\)
  1. Jointly Gaussian case, MAP = MMSE (posterior is Gaussian, hence pdf unimodal and symmetric, mean = mode = median)
  2. Do not commute over nonlinear transformation. (Invariant property does not hold, unlike ML)
  3. Commutes over linear transformation.
MAP tends to ML when
  • Prior is uninformative
  • Large amount of information in data compared to prior

Gaussian linear model

Let the observed samples takes on the model
\[  x = H\theta + w\] with prior \(\mathcal{N}(\mu_\theta, C_\theta)\) and noise vector \(\mathcal{N}(0, C_w)\) independent of \(\theta\), then the posterior is Gaussian with mean
\[ E(\theta|x) = \mu_\theta + C_\theta H^T (H C_\theta H^T + C_w)^{-1} (x - H \mu_\theta) \] and covariance \[ C_{\theta|x} = C_\theta - C_\theta H^T (H C_\theta H^T + C_w)^{-1} H C_\theta \] Contrary to the classical Gaussian linear model \(H\) does not need to be full rank.  
In alternative form, 
\[ E(\theta|x) = \mu_\theta + ( C_\theta^{-1} + H^T C_w^{-1} H )^{-1} H^T C_w^{-1} (x - H \mu_\theta)\] and \[ C_{\theta|x} = ( C_\theta^{-1} + H^T C_w^{-1} H )^{-1} \] 

LMMSE estimator \( E^*[X|Y] \)
  1. A function of first and second order statistics only.  \[E^*[X|Y] = \mu_x + \Sigma_{xy} \Sigma_{yy}^{-1} ( y - \mu_y) \] (inverse can be replaced with pseudo-inverse if necessary)
  2. Jointly Gaussian case, \(E^*[X|Y] = E[X|Y]\)
  3. Error orthogonal to subspace spanned by \(Y\)
  4. Additivity property \[E^*[X|Y_1,\dotsc,Y_k] = \sum_{j=1}^k E^*[X|Y_j] - (k-1)\mu_x \]

Properties of the exponential family of distributions

From Dasgupta (see link)

One parameter Exponential family

Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R} \} \), the pdf of which has the form
\[ f(x|\theta) = h(x) e^{\eta(\theta) T(x) - \psi^*(\theta)} \] 
If \(\eta(\theta)\) is a 1-1 function of  \(\theta\) we can drop \(\theta\) in the discussion.  Thus the family of distributions \(\{ P_\eta, \eta\in \Xi \subset \mathbb{R} \} \) is in canonical form.
\[ f(x|\theta) = h(x) e^{\eta T(x) - \psi(\eta)} \] and define the set 
\[ \mathcal{T} = \{ \eta : e^{\psi(\eta)} < \infty \}\]
\(\eta\) is the natural parameter, and \(\mathcal{T}\) the natural parameter space
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if \(\Xi = \mathcal{T}\), regular if \(\mathcal{T}\) is open.
[Brown] Let K be the convex support of the measure \(\nu\)
The family is minimal if \(\dim \Xi = \dim K = k\)
It is nonsingular if \( Var_\eta (T(X)) > 0 \) for all \(\eta \in \overset{\circ}{\mathcal{T}}\), the interior of \(\mathcal{T}\).

Theorem 1. \(\psi(\eta)\) is a convex function on \(\mathcal{T}\).
Theorem 2. \(\psi(\eta)\) is a cumulant generating function for any \( \eta \in \overset{\circ}{\mathcal{T}}\).
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...

Multi-parameter Exponential family

Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R}^k \} \), the pdf of which has the form
\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i(\theta) T_i(x) - \psi^*(\theta)} \] is the k-parameter Exponential family.
Where we reparametrize using \(\eta_i = \eta_i(\theta)\), we have the k-parameter canonical family.
The assumption here is that the dimension of \(\Theta\) and dimension of the image of \(\Theta\) under the map \( (\theta) \rightarrow (\eta_1(\theta),\dotsc,\eta_k(\theta) )\) are equal to \(k\).
The canonical form is 
\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i T_i(x) - \psi(\eta)} \]

Theorem 7.  Given a sample having a distribution \(P_\eta, \eta\in\mathcal{T}\) in the canonical k-parameter Exponential family.  with  \( \mathcal{T} = \{ \eta \in \mathbb{R}^k : e^{\psi(\eta)} < \infty \} \)
\(\psi(\eta))\) the partial derivatives of any order exists  for any \(\eta \in \overset{\circ}{\mathcal{T}}\)  

Definition.  The family is full rank if at every \(\eta \in \overset{\circ}{\mathcal{T}}\) the covariance matrix \[ I(\eta) = \frac{\partial^2}{\partial \eta_i \partial \eta_j} \psi(\eta) \ge 0 \] is nonsingular.
Definition/Theorem.  If the family is nonsingular, then the matrix \(I(\eta)\) is called the Fisher information matrix at \(\eta\) (for the natural parameter).
Proof.  For canonical exponential family, we have \(L(x,\eta) = \log p_\eta(x) \doteq \langle \eta, T(x) \rangle - \psi(\eta) \), \(L'(x;\eta) = T(x) - \frac{\partial }{\partial \eta} \psi(\eta) \) and \( L''(x;\eta) = - \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\) is constant for fixed \(\eta\), so  
\[ I(\eta) = \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\]

Sufficiency and Completeness

Theorem 8.  Suppose a family of distribution \(\mathcal{F} = \{ P_\theta, \theta \in \Theta\} \) belongs to a k-parameter Exponential family and that the "true" parameter space \(\Theta\) has a nonempty interior, then the family \(\mathcal{F}\) is complete.

Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family \(\mathcal{F}\), with a parameter space \(\Theta\) that has a nonempty interior, the natural sufficient statistic of the family \(T(X)\) and any ancillary statistic \(S(X)\) are independently distributed under each \(\theta \in \Theta\).

MLE of exponential family

Recall, \(L(x,\theta) = \log p_\theta(x) \doteq \langle \theta, T(x) \rangle - \psi(\theta) \).  The solution of the MLE satisfies 
\[ S(\theta) = \left. \frac{\partial}{\partial \theta} L(x;\theta) \right\vert_{\theta = \theta_{ML}}= 0 \; \Longleftrightarrow  \; T(x) = E_{\theta_{ML}} [ T(X) ] \] where \(  \frac{\partial}{\partial \theta} \psi(\theta) =  E_\theta [ T(X) ]  \)

The second derivative gives us 
\[ \frac{\partial^2}{ \partial \theta \partial \theta^T} L(x;\theta) = - I(\theta) = - Cov_\theta [ T(X) ]  \] The right hand side is negative definite for full rank family.  Therefore the log likelihood function is strictly concave in \(\theta\).

Existence of conjugate prior

For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family.  The marginalization to \(p(x) = \int p(x|\theta) p(\theta) d\theta \) is also tractable.

From Casella-Berger.  

Note that the parameter space is the "natural" parameter space.



Tuesday, September 06, 2016

Local convergence for exponential mixture family

From Redner, Walker 1984

Theorem 5.2.  Suppose that the Fisher information matrix \(I(\Phi)\) is positive definite at the true parameter \(\Phi^*\) and that \(\Phi^* = (\alpha_1^*, \dotsc, \alpha_m^*, \phi_1^*, \dotsc, \phi_m^*)\) is such that \(\alpha_i^* > 0 \text{ for } i = 1,\dotsc,m\).  For \(\Phi^{(0)} \in \Omega\), denote by \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) the sequence in \(\Omega\) generated by the EM iteration.  Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution \(\Phi^N = (\alpha_1^N, \dotsc, \alpha_m^N, \phi_1^N, \dotsc, \phi_m^N)\) of the likelihood equations is well defined and there is a certain norm on \(\Omega\) in which  \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) converges linearly to \(\Phi^N\) whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^N\), i.e. there is a constant \( 0 \leq \lambda < 1\), for which
\[ \lVert \Phi^{(j+1)} - \Phi^N \rVert \leq \lambda \lVert \Phi^{(j)} - \Phi^N \rVert, \quad j = 0,1,2,\dotsc \] whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^{N}\).

Differentiability of jump functions

Let
\[ j_n(x) =  \left\lbrace \begin{matrix}{} 0 & \text{if } x < x_n, \\ \theta_n & \text{if } x = x_n, \\ 1 & \text{if } x > x_n , \end{matrix} \right. \] For some \(0\leq \theta_n \leq 1\),  then the jump function is defined as
\[ J(x) = \sum_{n=1}^\infty \alpha_n j_n(x).\] with \(\sum_{n=1}^\infty \alpha_n < \infty\).
Theorem.  If  \(J\) is the jump function, then \(J'(x)\) exists and vanishes almost everywhere.  (non-zero in a set of measure zero, \( E = \{x : J'(x)\neq 0, x\in \mathcal{B} \}, m(E) = 0\) ).

Typical a probability distribution \(F\) is defined as a nondecreasing, right continuous function with \(F(-\infty) = 0,\; F(\infty)=1\).

Monday, August 29, 2016

Properties of Linear and Matrix Operators

Define the adjoint \(A^*\) of operator \(A\) such that
\[ \DeclareMathOperator{\rank}{rank} \langle y, Ax \rangle = \langle A^*y, x \rangle \]
We have the properties

  • \(\mathcal{N}(A) = \mathcal{N}(A^*A)\) and \(\mathcal{R}(A^*) = \mathcal{R}(A^*A)\)
  • \(\mathcal{N}(A^*) = \mathcal{N}(AA^*)\) and \(\mathcal{R}(A) = \mathcal{R}(AA^*)\)
And noting that \(\dim \mathcal{R}(A) = \dim \mathcal{R}(A^*)\), we have
  • \(\rank(A^*A) = \rank ( AA^*) = \rank(A) = \rank(A^*) \)

For matrix operators, dimension of the column space is equal to the dimension of the row space
  • column space: \(\dim (\mathcal{R}(A)) = r\)
  • row space: \(\dim (\mathcal{R}(A^H)) = r\)
  • Nullspace: \(\dim (\mathcal{N}(A)) = n -r\)
  • Left nullspace: \(\dim (\mathcal{N}(A^H))= m-r\)
Characterization of matrix \(AB\)
For matrices A and B such that AB exists
  1. \(\mathcal{N}(B) \subset \mathcal{N}(AB)\)
  2. \(\mathcal{R}(AB) \subset \mathcal{R}(A)\)
  3. \(\mathcal{N}(A^*) \subset \mathcal{N}((AB)^*)\)
  4. \(\mathcal{R}((AB)^*) \subset \mathcal{R}(B^*)\)
From 2 and 4
\[ \rank(AB) \leq \rank(A), \quad \rank (AB) \leq \rank(B)  \]

Thursday, August 25, 2016

Topology and Continuity concepts

Let \(S\) be a subset of a metric space \(M\)

  • \(S\) is closed if it contains all its limits.
  • \(S\) is open if for each \(p\in S\) there exists an \(r>0\) such that the open ball \(B(p,r)\) is entirely contained in \(S\)
  • The complement of an open set is closed and vice versa.
The topology of \(M\) is the collection \(\mathcal{T}\) of all open subsets of \(M\).

\(\mathcal{T}\) has the following properties
  • It is closed under arbitrary union of open sets
  • It is closed under finite intersections
  • \(\emptyset, M\) are open sets.
Corollary
  • arbitrary intersection of closed sets is closed
  • finite union of closed sets is closed
  • \(\emptyset, M\) are closed sets.
A metric space \(M\) is complete if each Cauchy sequence in \(M\) converges to a limit in \(M\).  
  • \(\mathbb{R}^n\) is complete
Every compact set is closed and bounded

Continuity of function \(f: M \rightarrow N\)
  • The pre-image of each open set in \(N\) is open in \(M\) 
  • Preserves convergence sequences under the transformation, i.e.
    \[ f( \lim x_n) = \lim f(x_n)\] for every convergent sequence \(\{x_n\}\)

Wednesday, August 17, 2016

Continuous mapping theorem

Continuous mapping theorem on Wiki


where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.


Friday, August 12, 2016

Kalman filter

Define the system
\[ x_{k+1} = F_k x_k + G_k w_k + \Gamma_k u_k \quad (1) \\
    z_k = H_k' x_k + v_k \quad (2)\] \(\{u_k\}\) is known, \(x_0 \sim (\bar{x}_0, P_0) \) and \( \{w_k\}, \{v_k\} \) are random sequences with
\[ \begin{bmatrix} w_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
 \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \right )  \]  with \( [w_k' \; v_k']' \) independent of other vectors indexed by \(l \neq k\) and \(x_0\)

One step predictor estimate

First we seek a recursive equation for \[ \hat{x}_{k|k-1} = E[x_k | Z^{k-1}] = E[x_k | \tilde{Z}^{k-1}] \] Define \(\tilde{x}_k = x_k - \hat{x}_{k|k-1}\), note that \(\{\tilde{x}_k\}\) is not an innovations sequence.  Because of the independence of the innovations we have
\[ E[x_{k+1}| \tilde{Z}^k] = E[x_{k+1} | \tilde{z}_k] +  E[x_{k+1}| \tilde{Z}^{k-1}] - \bar{x}_{k+1} \]
Where \( \bar{x}_k = E[x_k]\).  Recall
\[ \DeclareMathOperator{\cov}{cov} E[x_{k+1} | \tilde{z}_k] = \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_k) \cov^{-1}(\tilde{z}_k, \tilde{z}_k) \tilde{z}_k  \] Define the error covariance matrix \( \Sigma_{k|k-1} = E[\tilde{x}_k \tilde{x}_k' ] \) Then
\[ \begin{align*} \cov(x_{k+1}, \tilde{z}_k) &= \cov(F_k x_k + G_k w_k + \Gamma_k u_k, H_k' \tilde{x}_k + v_k) \\  &= E[ (F_k x_k + G_k w_k - F_k \bar{x}_k) (\tilde{x}_k' H_k + v_k') ]  \\ &= E[F_k x_k \tilde{x}_k' H_k ] + G_k S_k \\ &= F_k [ E(\hat{x}_{k|k-1} \tilde{x}_k') + E(\tilde{x}_k \tilde{x}_k')] H_k + G_k S_k \\ &= F_k \Sigma_{k|k-1} H_k + G_k S_k
\end{align*}
\] Observe that \( \hat{z}_{k|k-1} = H' \hat{x}_{k|k-1} \)  and subtracting from (2) gives \( \tilde{z}_k = H_k' \tilde{x}_k + v_k \).  Also note that \( E[\hat{x}_k \tilde{x}_k'] = 0\).  Next
\[ \begin{align*} \cov(\tilde{z}_k,\tilde{z}_k) &= \cov ( H_k' \tilde{x}_k + v_k, H_k' \tilde{x}_k + v_k) \\ &= H_k' \Sigma_{k|k-1} H_k + R_k  = \Omega_k \end{align*} \] We also have
\[ \begin{align*} E[x_{k+1} | \tilde{Z}_{k-1}] &= E[F_k x_k + G_k w_k + \Gamma_k u_k | \tilde{Z}_{k-1}] \\ &= F_k E[x_k | \tilde{Z}_{k=1} ] + \Gamma_k u_k \\ &= F_k \hat{x}_{k|k-1} + \Gamma_k u_k \end{align*} \] Collecting all terms above, the recursion becomes
\[  \hat{x}_{k+1|k} = F_k \hat{x}_{k|k-1} + \Gamma_k u_k + K_k (z_k - H_k' \hat{x}_{k|k-1}) \quad (9) \] with \(K_k = (F_k \Sigma_{k|k-1} H_k + G_k S_k ) \Omega_k^{-1} \)

The recursion of the error covariance is developed next.   From (1),(9), using the identity \(\tilde{x}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \) and expanding \(z_k\) using (2).
\[ \tilde{x}_{k+1} = (F_k - K_k H_k') \tilde{x}_k + G_k w_k - K_k v_k \] Since \(\tilde{x}_k\) and \( [w_k' v_k']' \) are independent and zero mean, we get
\[ \begin{multline*} E[\tilde{x}_{k+1} \tilde{x}_{k+1}'] = (F_k - K_k H_k') E(\tilde{x}_k \tilde{x}_k') ( F_k - K_k H_k')' \\ \times \begin{bmatrix} G_k &  -K_k \end{bmatrix} \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \begin{bmatrix} G_k' \\ -K_k' \end{bmatrix} \end{multline*} \] or
\[\begin{multline*} \Sigma_{k+1|k} = (F_k - K_k H_k') \Sigma_{k|k-1} (F_k - K_k H_k')' + G_k Q_k G_k' + K_k R_k K_k' \\ - G_k S_k K_k' - K_k S_k' G_k'  \end{multline*} \]
Filtered estimates

Defined in terms of \( \hat{x}_{k+1|k}\) and \( z_{k+1}\)
\[ \begin{align*} \hat{x}_{k+1|k+1} &= E[x_{k+1} | \tilde{Z}_{k+1}] \\ &= E[x_{k+1}|\tilde{z}_{k+1}] + E[x_{k+1}| \tilde{Z}_{k}] - \bar{x}_{k+1} \\ &= \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_{k+1}) \cov^{-1} (\tilde{z}_{k+1}, \tilde{z}_{k+1}) \tilde{z}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1} \end{align*} \]
Now \[ \begin{align*} \cov(x_{k+1}, \tilde{z}_{k+1}) &= E[ (\tilde{x}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1}) (\tilde{x}_{k+1} H_{k+1} + v_{k+1}) ] \\ &= E[ \tilde{x}_{k+1} \tilde{x}_{k+1}'] H_{k+1} \\ &= \Sigma_{k+1|k} H_{k+1} \end{align*} \]
From early results, we have \[ \cov(\tilde{z}_{k+1}, \tilde{z}_{k+1}) = H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} = \Omega_{k+1}\]  The measurement-update (filtered estimate) is
\[ \hat{x}_{k+1|k+1} = \hat{x}_{k+1|k} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} (z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \quad (6) \]
Define the uncorrelated input noise \( \tilde{w}_k = w_k - \hat{w}_k = w_k - S_k R_k^{-1} v_k\) such that
\[ \begin{bmatrix} \tilde{w}_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
 \begin{bmatrix} Q_k - S_k R_k^{-1}S_k' & 0 \\  0 & R_k \end{bmatrix} \right )  \]
then we have
\[ \begin{align*} x_{k+1} &= F_k x_k + G_k \tilde{w}_k + G_k S_k R_k^{-1} v_k + \Gamma_k u_k \\ &= (F_k - G_k S_k R_k^{-1} H_k') x_k + G_k \tilde{w}_k + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \end{align*} \] using the fact \(v_k = z_k - H_k' x_k\) .Noting that \( E[\tilde{w}_k v_k'] = 0 \),  the time update equation becomes
\[ \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \quad (5) \]
Error covariance for filtered estimates
The error covariance is
\[ \Sigma_{k|k} = E[ (x_k - \hat{x}_{k|k}) (x_k - \hat{x}_{k|k})'] \]
From (6) we have
\[ (x_{k+1} - \hat{x}_{k+1|k+1}) + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \tilde{z}_{k+1} = x_{k+1} - \hat{x}_{k+1|k}  \]
By the orthogonality principle, \(x_{k+1} - \hat{x}_{k+1|k+1} \) is orthogonal to \(\tilde{z}_{k+1}\).  Therefore,
\[ \Sigma_{k+1|k+1} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} = \Sigma_{k+1|k} \] or
\[ \Sigma_{k+1|k+1} = \Sigma_{k+1|k} -  \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \]
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
\[ x_{k+1} - \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') (x_k - \hat{x}_{k|k}) + G_w \tilde{w}_k \] and using the orthogonality of \(\tilde{w}_k\) and \(x_k - \hat{x}_{k|k}\), we obtain
\[ \begin{multline*} \Sigma_{k+1|k} = (F_k - G_k S_k R_k ^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' \\ + G_k(Q_k - S_k R_k^{-1} S_k') G_k \end{multline*} \]
Summary

Measurement update
\[\begin{align*} \hat{x}_{k+1|k+1} &= \hat{x}_{k+1|k} H_{k+1}' \Omega_{k+1}^{-1} ( z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \\  \Sigma_{k+1|k+1} &=  \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} \end{align*} \]
Time update
\[ \begin{align*} \hat{x}_{k+1|k} &= ( F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \\ \Sigma_{k+1|k} &= (F_k - G_k S_k R_k^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' + G_k (Q_k - S_k R_k^{-1} S_k') G_k' \end{align*} \]
Time update with \(S_k = 0\)
\[ \begin{align*} \hat{x}_{k+1|k} &= F_k \hat{x}_{k|k} + \Gamma_k u_k \\ \Sigma_{k+1|k} &= F_k \Sigma_{k|k} F_k' + G_k Q_k G_k' \end{align*} \]
Combined update with \(S_k = 0\) for filtered state:
\[ \begin{align*} \hat{x}_{k+1|k+1} &= F_k \hat{x}_{k|k} + L_{k+1} ( z_{k+1} - H_{k+1}' F_k \hat{x}_{k|k} - H_{k+1}' \Gamma_k u_k)  \\  L_{k+1} &= \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \\  \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1}\end{align*} \]

Wednesday, August 10, 2016

Innovations sequence

Definition 
Suppose \( \{z_k\} \) is a sequence of jointly Gaussian random elements.   The innovations process \(\{\tilde{z}_k\} \) is such that \(\tilde{z}_k\) consists of that part of \(z_k\) containing new information not carried in \(z_{k-1}, z_{k-2}, \dotsc\).
\[ \tilde{z}_k = z_k - E[z_k | z_0, \dotsc, z_{k-1} ]  = z_k - E[z_k | Z^{k-1}] \] with \( \tilde{z}_0 = z_0 - E[z_0] \).

Properties

  1. \(\tilde{z}_k\) independent of \( z_0, \dotsc, z_{k-1}\) by definition
  2. (1) implies \(E[ \tilde{z}_k' \tilde{z}_l] = 0, l \neq k \)
  3. \(E[z_k | Z^{k-1}]\) is a linear combination of \(z_0, \dotsc, z_{k-1}\)
  4. The sequence \(\{\tilde{z}_k\} \) can be obtained from \(\{z_k\} \) by a causal linear operation.
  5. The sequence \(\{z_k\} \) can be reconstructed from \(\{\tilde{z}_k\} \) by a causal linear operation. 
  6. (4) and (5) implies \( E[z_k | Z^{k-1}] = E[z_k | \tilde{Z}^{k-1}] \) or more generally  \( E[w | Z^{k-1}] = E[w | \tilde{Z}^{k-1}] \) for jointly Gaussian \(w, \{z_k\} \)
  7. For zero mean Gaussian \(\tilde{x}_k\), \(\tilde{z}_k\), we have \[ E[x_k|Z^{k-1}] = E[x_k|\tilde{Z}^{k-1}] = E[x_k| \tilde{z}_0] + \dotsb + E[x_k| \tilde{z}_{k-1}]    \]



Friday, August 05, 2016

Properties of the exponential family distributions

Given exponential family \( \mathcal{P}=\{p_\theta(x) | \theta \in \Theta \} \), where
\[ p_\theta(x) = h(x) \exp ( q^T(\theta) T(x) - b(\theta)  )  I_{supp}(x), \quad Z = \exp(- b(\theta)) \]
Regular family (gives you completeness)
Conditions for regularity,

  1. support \(p_\theta(x)\) independent of \(\theta\)
  2. finite partition function \(Z(\theta) < \infty,\; \forall \theta\)
  3. Interior of parameter space is solid, \( \mathring{\Theta} \neq \emptyset \), 
  4. Interior of natural parameter space is solid \( \mathring{\mathcal{Q}} \neq \emptyset \)
  5. Statistic vector function and the constant function are linearly independent.  i.e. \( [1, T_1(x),\dotsc,T_K(x)] \) linear indep. (gives you minimal statistic)
  6. twice differentiable \( p_\theta(x) \) 

Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter \(\mathbf{\theta}=(\theta_1,\dotsc,\theta_r)\) is less than the dimension of the natural statistic \(\mathbf{T}(\mathbf{x}) \) is called a curved family.

Identifiability of parameter vector \( \mathbf{\theta} \).
When statistic is minimal, then it is a matter of ensuring \(q: \Theta \mapsto \mathcal{Q} \) defines a 1-1 mapping from desired parameter space to natural parameter space.

Thursday, July 21, 2016

Invariance and carry over properties of MLE

Review: Asymptotic properties of MLE
  • Asymptotically efficient (attains CRLB as \(N\rightarrow\infty\))
  • Asymptotically Gaussian (asymptotically normality)
  • Asymptotically Unbiased
  • Consistent (weakly and strongly)
First, the invariance property of MLE

The MLE of the parameter \(\alpha = g(\theta)\), where the PDF \(p(x;\theta)\) is paremeterized by \(\theta\), is given by
\[ \hat{\alpha} = g(\hat{\theta})\] where \(\hat{\theta}\) is the MLE of \(\theta\).

Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.

If \(g(\theta)\) is continuous in \(\theta\), the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator \(g(\hat{\theta})\)

However, biasedness of the estimator \(g(\hat{\theta})\) depends on the convexity of \(g\) and does not carry over from \(\hat{\theta}\).

Other properties of MLE
  • If an efficient estimator exists, the ML method will produce it.
  • Unlike the MVU estimator, MLE can be biased
  • Note: CRLB applies to unbiased estimators, so when estimator is biased, it is possible it has variance smaller than \(I^{-1}(\theta)\)

Thursday, July 14, 2016

Properties of a regular family of parameterized distribution

A family of parameterized distribution defined by
\[ \mathcal{P} = \{ p_\theta(y)  | \theta \in \Theta \subset \mathbb{R}^P \}\]
is regular if it satisfies the following conditions
  1. Support of \(p_\theta(y)\) does not depend on \(\theta\) for all \(\theta \in \Theta\)
  2. \(\frac{\partial}{\partial \theta} p_\theta(y) \) exists
  3. Optional \( \frac{\partial^2}{\partial \theta^2} p_\theta(y) \) exists
Note \[ \frac{\partial }{ \partial \theta } \ln p_\theta(y)  = \frac{1}{p_\theta(y) } \frac{\partial }{ \partial \theta } p_\theta(y) \quad \quad (4) \]
Define the score function (log := natural log)
\[  S_\theta (y) := \nabla_\theta \log p_\theta(y) \]
Note also
\[ E_\theta \{ 1 \} = 1 = \int_\mathcal{Y} p_\theta(y) dy \]
As a result of the above we have (Kay's definition of regular)
\begin{align*}  0 &= \frac{\partial }{ \partial \theta }E_\theta \{1\} \\
&= \frac{\partial }{ \partial \theta }\int p_\theta(y) dy \\
&\overset{1}{=} \int \frac{\partial }{ \partial \theta }  p_\theta(y) dy \\
&\overset{2,4}{=} \int p_\theta(y)  \frac{\partial }{ \partial \theta } \log p_\theta(y) dy \\
&= E_\theta \{ S_\theta(y) \}
\end{align*}

Friday, July 01, 2016

Spectral Theorem for Diagonalizable Matrices

It occurs to me that most presentation of the spectrum theorem only concerns orthonormal basis.  This is a more general result from Meyer.

Theorem
A matrix \( \mathbf{A} \in \mathbb{R}^{n\times x}\) with spectrum \(\sigma(\mathbf{A}) = \{ \lambda_1, \dotsc, \lambda_k \} \) is diagonalizable if and only if there exist matrices \(\{ \mathbf{G}_1, \dotsc, \mathbf{G}_k\} \) such that  \[ \mathbf{A} = \lambda_1 \mathbf{G}_1 + \dotsb + \lambda_k  \mathbf{G}_k \] where the \(\mathbf{G}_i\)'s have the following properties
  • \(\mathbf{G}_i\) is the projector onto \(\mathcal{N} (\mathbf{A} - \lambda_i \mathbf{I})  \) along \(\mathcal{R} ( \mathbf{A} - \lambda_i \mathbf{I} ) \). 
  • \(\mathbf{G}_i\mathbf{G}_j = 0 \) whenever \( i \neq j \)
  • \( \mathbf{G}_1 + \dotsb + \mathbf{G}_k = 1\)
The expansion is known as the spectral decomposition of \(\mathbf{A}\), and the \(\mathbf{G}_i\)'s are called the spectral projectors associated with \(\mathbf{A}\).

Note that being a projector \(\mathbf{G}_i\) is idempotent.
  • \(\mathbf{G}_i = \mathbf{G}_i^2\)
And since \(\mathcal{N}(\mathbf{G}_i) = \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \) and \(\mathcal{R}(\mathbf{G}_i) = \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \), we have the following equivalent complimentary subspaces
  • \(  \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \oplus \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \)
  • \(  \mathcal{R}(\mathbf{G}_i) \oplus \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \)
  • \(  \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \oplus  \mathcal{N}(\mathbf{G}_i) \)
  • \(  \mathcal{R}(\mathbf{G}_i)  \oplus  \mathcal{N}(\mathbf{G}_i) \)

Friday, June 24, 2016

Conference submission deadlines 2016

ICASSP 2017 - 2016/09/12
WCNC 2017 - 2016/09/30

Monday, June 06, 2016

Majorization and Schur-convexity

Majorization 

A real vector \( b = (b_1,\dotsc,b_n) \) is said to majorize \( a = ( a_1,\dotsc, a_n)  \), denote \( a \succ b\ \)  if

  1.  \( \sum_{i=1}^n a_i  = \sum_{i=1}^n b_i \), and
  2. \( \sum_{i=k}^n a_{(i)} \leq \sum_{i=k}^n b_{(i)} \), \( k = 2,\dotsc,n \)
where \( a_{(1)} \leq \dotsb \leq a_{(n)} \), \( b_{(1)} \leq \dotsb \leq b_{(n)} \) are \(a\) and \(b\) arranged in increasing order. 

A function \( \phi(a) \) symmetric in the coordinates of \( a = ( a_1, \dotsc, a_n ) \) is said to be Schur-concave if  \( a \succ b \) implies \( \phi(a) \ge \phi(b) \).  

A function \( \phi(a) \) is Schur-convex if \( -\phi(a) \) is Schur-concave.

Tuesday, May 03, 2016

Requirements for good parenting


  1. Keep my child's emotional love tank full - speak the five love languages.
    • physical touch
    • words of affirmation
    • quality time
    • gifts
    • acts of service
  2. Use the most positive ways I can to control my child's behavior: requests, gentle physical manipulation, commands, punishment, and behavior modification.
  3. Lovingly discipline my child.  Ask, "What does this child need?" and then go about it logically.
  4. Do my best to handle my own anger appropriately and not dump it on my child.  Be kind but firm.
  5. Do my best to train my child to handle anger maturely - the goal is sixteen and a half years. 
From the five love languages of children

Monday, April 11, 2016

RAII (Resource acquisition is initialization) definition

https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization

Thursday, January 14, 2016

UL/DL network duality for SINR metric

See Boche and Schubert's "A General Duality Theory for Uplink and Downlink Beamforming"

Tuesday, January 12, 2016

Notes on Recursive Least Squares (RLS)


Method of Least Squares
  • Assuming a multiple linear regression model, the method attempts to choose the tap weights to minimize the sum of error squares.
  • When the error process is white and zero mean, the least-squares estimate is the best linear unbiased estimate (BLUE)
  • When the error process is white Gaussian zero mean, the least-squares estimate achieves the Cramer-Rao lower bound (CRLB) for unbiased estimates, hence a minimum-variance unbiased estimate (MVUE)
Recursive Least Squares
  • Allows one to update the tap weights as the input becomes available.
  • Can incorporate additional constraints such as weighted error squares or a regularizing term, [commonly applied due to the ill-posed nature of the problem].
  • The inversion of the correlation matrix is replaced by a simple scalar division.
  • Initial correlation matrix provide a mean to specify regularization.
  • The fundamental difference between RLS and LMS: 
    • The step-size parameter \(\mu\) in LMS is replaced by \(\mathbf{\Phi}^{-1}(n)\), the inverse of the correlation matrix of the input \(\mathbf{u}(n)\), which has the effect of whitening the inputs.
  • The rate of convergence of RLS is invariant to the eigenvalue spread of the ensemble average input correlation matrix \(\mathbf{R}\)
  • The excessive mean-square error converges to zero if stationary environment is assumed and the exponential weight factor is set to \(\lambda=1\).