MathJax

Monday, August 29, 2016

Properties of Linear and Matrix Operators

Define the adjoint \(A^*\) of operator \(A\) such that
\[ \DeclareMathOperator{\rank}{rank} \langle y, Ax \rangle = \langle A^*y, x \rangle \]
We have the properties

  • \(\mathcal{N}(A) = \mathcal{N}(A^*A)\) and \(\mathcal{R}(A^*) = \mathcal{R}(A^*A)\)
  • \(\mathcal{N}(A^*) = \mathcal{N}(AA^*)\) and \(\mathcal{R}(A) = \mathcal{R}(AA^*)\)
And noting that \(\dim \mathcal{R}(A) = \dim \mathcal{R}(A^*)\), we have
  • \(\rank(A^*A) = \rank ( AA^*) = \rank(A) = \rank(A^*) \)

For matrix operators, dimension of the column space is equal to the dimension of the row space
  • column space: \(\dim (\mathcal{R}(A)) = r\)
  • row space: \(\dim (\mathcal{R}(A^H)) = r\)
  • Nullspace: \(\dim (\mathcal{N}(A)) = n -r\)
  • Left nullspace: \(\dim (\mathcal{N}(A^H))= m-r\)
Characterization of matrix \(AB\)
For matrices A and B such that AB exists
  1. \(\mathcal{N}(B) \subset \mathcal{N}(AB)\)
  2. \(\mathcal{R}(AB) \subset \mathcal{R}(A)\)
  3. \(\mathcal{N}(A^*) \subset \mathcal{N}((AB)^*)\)
  4. \(\mathcal{R}((AB)^*) \subset \mathcal{R}(B^*)\)
From 2 and 4
\[ \rank(AB) \leq \rank(A), \quad \rank (AB) \leq \rank(B)  \]

Thursday, August 25, 2016

Topology and Continuity concepts

Let \(S\) be a subset of a metric space \(M\)

  • \(S\) is closed if it contains all its limits.
  • \(S\) is open if for each \(p\in S\) there exists an \(r>0\) such that the open ball \(B(p,r)\) is entirely contained in \(S\)
  • The complement of an open set is closed and vice versa.
The topology of \(M\) is the collection \(\mathcal{T}\) of all open subsets of \(M\).

\(\mathcal{T}\) has the following properties
  • It is closed under arbitrary union of open sets
  • It is closed under finite intersections
  • \(\emptyset, M\) are open sets.
Corollary
  • arbitrary intersection of closed sets is closed
  • finite union of closed sets is closed
  • \(\emptyset, M\) are closed sets.
A metric space \(M\) is complete if each Cauchy sequence in \(M\) converges to a limit in \(M\).  
  • \(\mathbb{R}^n\) is complete
Every compact set is closed and bounded

Continuity of function \(f: M \rightarrow N\)
  • The pre-image of each open set in \(N\) is open in \(M\) 
  • Preserves convergence sequences under the transformation, i.e.
    \[ f( \lim x_n) = \lim f(x_n)\] for every convergent sequence \(\{x_n\}\)

Wednesday, August 17, 2016

Continuous mapping theorem

Continuous mapping theorem on Wiki


where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.


Friday, August 12, 2016

Kalman filter

Define the system
\[ x_{k+1} = F_k x_k + G_k w_k + \Gamma_k u_k \quad (1) \\
    z_k = H_k' x_k + v_k \quad (2)\] \(\{u_k\}\) is known, \(x_0 \sim (\bar{x}_0, P_0) \) and \( \{w_k\}, \{v_k\} \) are random sequences with
\[ \begin{bmatrix} w_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
 \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \right )  \]  with \( [w_k' \; v_k']' \) independent of other vectors indexed by \(l \neq k\) and \(x_0\)

One step predictor estimate

First we seek a recursive equation for \[ \hat{x}_{k|k-1} = E[x_k | Z^{k-1}] = E[x_k | \tilde{Z}^{k-1}] \] Define \(\tilde{x}_k = x_k - \hat{x}_{k|k-1}\), note that \(\{\tilde{x}_k\}\) is not an innovations sequence.  Because of the independence of the innovations we have
\[ E[x_{k+1}| \tilde{Z}^k] = E[x_{k+1} | \tilde{z}_k] +  E[x_{k+1}| \tilde{Z}^{k-1}] - \bar{x}_{k+1} \]
Where \( \bar{x}_k = E[x_k]\).  Recall
\[ \DeclareMathOperator{\cov}{cov} E[x_{k+1} | \tilde{z}_k] = \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_k) \cov^{-1}(\tilde{z}_k, \tilde{z}_k) \tilde{z}_k  \] Define the error covariance matrix \( \Sigma_{k|k-1} = E[\tilde{x}_k \tilde{x}_k' ] \) Then
\[ \begin{align*} \cov(x_{k+1}, \tilde{z}_k) &= \cov(F_k x_k + G_k w_k + \Gamma_k u_k, H_k' \tilde{x}_k + v_k) \\  &= E[ (F_k x_k + G_k w_k - F_k \bar{x}_k) (\tilde{x}_k' H_k + v_k') ]  \\ &= E[F_k x_k \tilde{x}_k' H_k ] + G_k S_k \\ &= F_k [ E(\hat{x}_{k|k-1} \tilde{x}_k') + E(\tilde{x}_k \tilde{x}_k')] H_k + G_k S_k \\ &= F_k \Sigma_{k|k-1} H_k + G_k S_k
\end{align*}
\] Observe that \( \hat{z}_{k|k-1} = H' \hat{x}_{k|k-1} \)  and subtracting from (2) gives \( \tilde{z}_k = H_k' \tilde{x}_k + v_k \).  Also note that \( E[\hat{x}_k \tilde{x}_k'] = 0\).  Next
\[ \begin{align*} \cov(\tilde{z}_k,\tilde{z}_k) &= \cov ( H_k' \tilde{x}_k + v_k, H_k' \tilde{x}_k + v_k) \\ &= H_k' \Sigma_{k|k-1} H_k + R_k  = \Omega_k \end{align*} \] We also have
\[ \begin{align*} E[x_{k+1} | \tilde{Z}_{k-1}] &= E[F_k x_k + G_k w_k + \Gamma_k u_k | \tilde{Z}_{k-1}] \\ &= F_k E[x_k | \tilde{Z}_{k=1} ] + \Gamma_k u_k \\ &= F_k \hat{x}_{k|k-1} + \Gamma_k u_k \end{align*} \] Collecting all terms above, the recursion becomes
\[  \hat{x}_{k+1|k} = F_k \hat{x}_{k|k-1} + \Gamma_k u_k + K_k (z_k - H_k' \hat{x}_{k|k-1}) \quad (9) \] with \(K_k = (F_k \Sigma_{k|k-1} H_k + G_k S_k ) \Omega_k^{-1} \)

The recursion of the error covariance is developed next.   From (1),(9), using the identity \(\tilde{x}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \) and expanding \(z_k\) using (2).
\[ \tilde{x}_{k+1} = (F_k - K_k H_k') \tilde{x}_k + G_k w_k - K_k v_k \] Since \(\tilde{x}_k\) and \( [w_k' v_k']' \) are independent and zero mean, we get
\[ \begin{multline*} E[\tilde{x}_{k+1} \tilde{x}_{k+1}'] = (F_k - K_k H_k') E(\tilde{x}_k \tilde{x}_k') ( F_k - K_k H_k')' \\ \times \begin{bmatrix} G_k &  -K_k \end{bmatrix} \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \begin{bmatrix} G_k' \\ -K_k' \end{bmatrix} \end{multline*} \] or
\[\begin{multline*} \Sigma_{k+1|k} = (F_k - K_k H_k') \Sigma_{k|k-1} (F_k - K_k H_k')' + G_k Q_k G_k' + K_k R_k K_k' \\ - G_k S_k K_k' - K_k S_k' G_k'  \end{multline*} \]
Filtered estimates

Defined in terms of \( \hat{x}_{k+1|k}\) and \( z_{k+1}\)
\[ \begin{align*} \hat{x}_{k+1|k+1} &= E[x_{k+1} | \tilde{Z}_{k+1}] \\ &= E[x_{k+1}|\tilde{z}_{k+1}] + E[x_{k+1}| \tilde{Z}_{k}] - \bar{x}_{k+1} \\ &= \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_{k+1}) \cov^{-1} (\tilde{z}_{k+1}, \tilde{z}_{k+1}) \tilde{z}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1} \end{align*} \]
Now \[ \begin{align*} \cov(x_{k+1}, \tilde{z}_{k+1}) &= E[ (\tilde{x}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1}) (\tilde{x}_{k+1} H_{k+1} + v_{k+1}) ] \\ &= E[ \tilde{x}_{k+1} \tilde{x}_{k+1}'] H_{k+1} \\ &= \Sigma_{k+1|k} H_{k+1} \end{align*} \]
From early results, we have \[ \cov(\tilde{z}_{k+1}, \tilde{z}_{k+1}) = H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} = \Omega_{k+1}\]  The measurement-update (filtered estimate) is
\[ \hat{x}_{k+1|k+1} = \hat{x}_{k+1|k} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} (z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \quad (6) \]
Define the uncorrelated input noise \( \tilde{w}_k = w_k - \hat{w}_k = w_k - S_k R_k^{-1} v_k\) such that
\[ \begin{bmatrix} \tilde{w}_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
 \begin{bmatrix} Q_k - S_k R_k^{-1}S_k' & 0 \\  0 & R_k \end{bmatrix} \right )  \]
then we have
\[ \begin{align*} x_{k+1} &= F_k x_k + G_k \tilde{w}_k + G_k S_k R_k^{-1} v_k + \Gamma_k u_k \\ &= (F_k - G_k S_k R_k^{-1} H_k') x_k + G_k \tilde{w}_k + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \end{align*} \] using the fact \(v_k = z_k - H_k' x_k\) .Noting that \( E[\tilde{w}_k v_k'] = 0 \),  the time update equation becomes
\[ \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \quad (5) \]
Error covariance for filtered estimates
The error covariance is
\[ \Sigma_{k|k} = E[ (x_k - \hat{x}_{k|k}) (x_k - \hat{x}_{k|k})'] \]
From (6) we have
\[ (x_{k+1} - \hat{x}_{k+1|k+1}) + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \tilde{z}_{k+1} = x_{k+1} - \hat{x}_{k+1|k}  \]
By the orthogonality principle, \(x_{k+1} - \hat{x}_{k+1|k+1} \) is orthogonal to \(\tilde{z}_{k+1}\).  Therefore,
\[ \Sigma_{k+1|k+1} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} = \Sigma_{k+1|k} \] or
\[ \Sigma_{k+1|k+1} = \Sigma_{k+1|k} -  \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \]
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
\[ x_{k+1} - \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') (x_k - \hat{x}_{k|k}) + G_w \tilde{w}_k \] and using the orthogonality of \(\tilde{w}_k\) and \(x_k - \hat{x}_{k|k}\), we obtain
\[ \begin{multline*} \Sigma_{k+1|k} = (F_k - G_k S_k R_k ^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' \\ + G_k(Q_k - S_k R_k^{-1} S_k') G_k \end{multline*} \]
Summary

Measurement update
\[\begin{align*} \hat{x}_{k+1|k+1} &= \hat{x}_{k+1|k} H_{k+1}' \Omega_{k+1}^{-1} ( z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \\  \Sigma_{k+1|k+1} &=  \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} \end{align*} \]
Time update
\[ \begin{align*} \hat{x}_{k+1|k} &= ( F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \\ \Sigma_{k+1|k} &= (F_k - G_k S_k R_k^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' + G_k (Q_k - S_k R_k^{-1} S_k') G_k' \end{align*} \]
Time update with \(S_k = 0\)
\[ \begin{align*} \hat{x}_{k+1|k} &= F_k \hat{x}_{k|k} + \Gamma_k u_k \\ \Sigma_{k+1|k} &= F_k \Sigma_{k|k} F_k' + G_k Q_k G_k' \end{align*} \]
Combined update with \(S_k = 0\) for filtered state:
\[ \begin{align*} \hat{x}_{k+1|k+1} &= F_k \hat{x}_{k|k} + L_{k+1} ( z_{k+1} - H_{k+1}' F_k \hat{x}_{k|k} - H_{k+1}' \Gamma_k u_k)  \\  L_{k+1} &= \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \\  \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1}\end{align*} \]

Wednesday, August 10, 2016

Innovations sequence

Definition 
Suppose \( \{z_k\} \) is a sequence of jointly Gaussian random elements.   The innovations process \(\{\tilde{z}_k\} \) is such that \(\tilde{z}_k\) consists of that part of \(z_k\) containing new information not carried in \(z_{k-1}, z_{k-2}, \dotsc\).
\[ \tilde{z}_k = z_k - E[z_k | z_0, \dotsc, z_{k-1} ]  = z_k - E[z_k | Z^{k-1}] \] with \( \tilde{z}_0 = z_0 - E[z_0] \).

Properties

  1. \(\tilde{z}_k\) independent of \( z_0, \dotsc, z_{k-1}\) by definition
  2. (1) implies \(E[ \tilde{z}_k' \tilde{z}_l] = 0, l \neq k \)
  3. \(E[z_k | Z^{k-1}]\) is a linear combination of \(z_0, \dotsc, z_{k-1}\)
  4. The sequence \(\{\tilde{z}_k\} \) can be obtained from \(\{z_k\} \) by a causal linear operation.
  5. The sequence \(\{z_k\} \) can be reconstructed from \(\{\tilde{z}_k\} \) by a causal linear operation. 
  6. (4) and (5) implies \( E[z_k | Z^{k-1}] = E[z_k | \tilde{Z}^{k-1}] \) or more generally  \( E[w | Z^{k-1}] = E[w | \tilde{Z}^{k-1}] \) for jointly Gaussian \(w, \{z_k\} \)
  7. For zero mean Gaussian \(\tilde{x}_k\), \(\tilde{z}_k\), we have \[ E[x_k|Z^{k-1}] = E[x_k|\tilde{Z}^{k-1}] = E[x_k| \tilde{z}_0] + \dotsb + E[x_k| \tilde{z}_{k-1}]    \]



Friday, August 05, 2016

Properties of the exponential family distributions

Given exponential family \( \mathcal{P}=\{p_\theta(x) | \theta \in \Theta \} \), where
\[ p_\theta(x) = h(x) \exp ( q^T(\theta) T(x) - b(\theta)  )  I_{supp}(x), \quad Z = \exp(- b(\theta)) \]
Regular family (gives you completeness)
Conditions for regularity,

  1. support \(p_\theta(x)\) independent of \(\theta\)
  2. finite partition function \(Z(\theta) < \infty,\; \forall \theta\)
  3. Interior of parameter space is solid, \( \mathring{\Theta} \neq \emptyset \), 
  4. Interior of natural parameter space is solid \( \mathring{\mathcal{Q}} \neq \emptyset \)
  5. Statistic vector function and the constant function are linearly independent.  i.e. \( [1, T_1(x),\dotsc,T_K(x)] \) linear indep. (gives you minimal statistic)
  6. twice differentiable \( p_\theta(x) \) 

Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter \(\mathbf{\theta}=(\theta_1,\dotsc,\theta_r)\) is less than the dimension of the natural statistic \(\mathbf{T}(\mathbf{x}) \) is called a curved family.

Identifiability of parameter vector \( \mathbf{\theta} \).
When statistic is minimal, then it is a matter of ensuring \(q: \Theta \mapsto \mathcal{Q} \) defines a 1-1 mapping from desired parameter space to natural parameter space.