- \(F\) is right continous;
- \(F\) is monotone non-decreasing,
- \(F\) has limits at \(\pm\infty\)
\begin{align*} F(\infty) &:= \lim_{x\uparrow \infty} F(x) = 1 \\ F(-\infty) &:=\lim_{x\downarrow - \infty} F(x) = 0 \end{align*}
A collection of random thoughts and materials that might prove enlightening to me and my friends.
MathJax
Monday, December 12, 2016
Definition of a distribution function
A function \(F: \mathbb{R} \mapsto [0,1]\) satisfying the following properties is a distribution function.
Wednesday, October 19, 2016
VMWare disk activity reduction methods
See this forum link
and also vmware knowledge base
Also try disabling swap in Linux guest (sudo swapoff -a)
and also vmware knowledge base
Also try disabling swap in Linux guest (sudo swapoff -a)
Friday, September 16, 2016
Enabling copy/paste in vmware 12 player
1) sudo apt-get autoremove open-vm-tools
2) Install VMware Tools by following the usual method (Virtual Machine --> Reinstall VMWare Tools)
3) Reboot the VM
4) sudo apt-get install open-vm-tools-desktop
5) Reboot the VM, after the reboot copy/paste and drag/drop will work!
Monday, September 12, 2016
Properties of MMSE and MAP estimator (Bayesian)
The MMSE estimator is the mean of the posterior pdf \(E(x|y)\) of \(x\) given observation \(y\).
- The estimator is unbiased.
- The covariance is reduced compared to the a priori information.
- Commutes over affine transformation.
- Additivity property for independent data sets.
- Linear in the Gaussian case.
- The estimator error is orthogonal to the space spanned by all Y-measurable functions (affine functions being a subset)
The MAP estimator \( \textsf{arg max}_\theta \; p(\theta|x) \) given observation \(x\)
- Jointly Gaussian case, MAP = MMSE (posterior is Gaussian, hence pdf unimodal and symmetric, mean = mode = median)
- Do not commute over nonlinear transformation. (Invariant property does not hold, unlike ML)
- Commutes over linear transformation.
MAP tends to ML when
- Prior is uninformative
- Large amount of information in data compared to prior
Gaussian linear model
Let the observed samples takes on the model
\[ x = H\theta + w\] with prior \(\mathcal{N}(\mu_\theta, C_\theta)\) and noise vector \(\mathcal{N}(0, C_w)\) independent of \(\theta\), then the posterior is Gaussian with mean
\[ E(\theta|x) = \mu_\theta + C_\theta H^T (H C_\theta H^T + C_w)^{-1} (x - H \mu_\theta) \] and covariance \[ C_{\theta|x} = C_\theta - C_\theta H^T (H C_\theta H^T + C_w)^{-1} H C_\theta \] Contrary to the classical Gaussian linear model \(H\) does not need to be full rank.
In alternative form,
\[ E(\theta|x) = \mu_\theta + ( C_\theta^{-1} + H^T C_w^{-1} H )^{-1} H^T C_w^{-1} (x - H \mu_\theta)\] and \[ C_{\theta|x} = ( C_\theta^{-1} + H^T C_w^{-1} H )^{-1} \]
LMMSE estimator \( E^*[X|Y] \)
- A function of first and second order statistics only. \[E^*[X|Y] = \mu_x + \Sigma_{xy} \Sigma_{yy}^{-1} ( y - \mu_y) \] (inverse can be replaced with pseudo-inverse if necessary)
- Jointly Gaussian case, \(E^*[X|Y] = E[X|Y]\)
- Error orthogonal to subspace spanned by \(Y\)
- Additivity property \[E^*[X|Y_1,\dotsc,Y_k] = \sum_{j=1}^k E^*[X|Y_j] - (k-1)\mu_x \]
Properties of the exponential family of distributions
From Dasgupta (see link)
One parameter Exponential family
Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R} \} \), the pdf of which has the form
\[ f(x|\theta) = h(x) e^{\eta(\theta) T(x) - \psi^*(\theta)} \]
If \(\eta(\theta)\) is a 1-1 function of \(\theta\) we can drop \(\theta\) in the discussion. Thus the family of distributions \(\{ P_\eta, \eta\in \Xi \subset \mathbb{R} \} \) is in canonical form.
\[ f(x|\theta) = h(x) e^{\eta T(x) - \psi(\eta)} \] and define the set
\[ \mathcal{T} = \{ \eta : e^{\psi(\eta)} < \infty \}\]
\(\eta\) is the natural parameter, and \(\mathcal{T}\) the natural parameter space.
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if \(\Xi = \mathcal{T}\), regular if \(\mathcal{T}\) is open.
[Brown] Let K be the convex support of the measure \(\nu\)
The family is minimal if \(\dim \Xi = \dim K = k\)
It is nonsingular if \( Var_\eta (T(X)) > 0 \) for all \(\eta \in \overset{\circ}{\mathcal{T}}\), the interior of \(\mathcal{T}\).
Theorem 1. \(\psi(\eta)\) is a convex function on \(\mathcal{T}\).
Theorem 2. \(\psi(\eta)\) is a cumulant generating function for any \( \eta \in \overset{\circ}{\mathcal{T}}\).
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...
Multi-parameter Exponential family
Given the family of distribution \(\{ P_\theta, \theta \in \Theta \subset \mathbb{R}^k \} \), the pdf of which has the form
\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i(\theta) T_i(x) - \psi^*(\theta)} \] is the k-parameter Exponential family.
Where we reparametrize using \(\eta_i = \eta_i(\theta)\), we have the k-parameter canonical family.
The assumption here is that the dimension of \(\Theta\) and dimension of the image of \(\Theta\) under the map \( (\theta) \rightarrow (\eta_1(\theta),\dotsc,\eta_k(\theta) )\) are equal to \(k\).
The canonical form is
\[ f(x|\theta) = h(x) e^{\sum_{i=1}^k \eta_i T_i(x) - \psi(\eta)} \]
Theorem 7. Given a sample having a distribution \(P_\eta, \eta\in\mathcal{T}\) in the canonical k-parameter Exponential family. with \( \mathcal{T} = \{ \eta \in \mathbb{R}^k : e^{\psi(\eta)} < \infty \} \)
\(\psi(\eta))\) the partial derivatives of any order exists for any \(\eta \in \overset{\circ}{\mathcal{T}}\)
Definition. The family is full rank if at every \(\eta \in \overset{\circ}{\mathcal{T}}\) the covariance matrix \[ I(\eta) = \frac{\partial^2}{\partial \eta_i \partial \eta_j} \psi(\eta) \ge 0 \] is nonsingular.
Definition/Theorem. If the family is nonsingular, then the matrix \(I(\eta)\) is called the Fisher information matrix at \(\eta\) (for the natural parameter).
Proof. For canonical exponential family, we have \(L(x,\eta) = \log p_\eta(x) \doteq \langle \eta, T(x) \rangle - \psi(\eta) \), \(L'(x;\eta) = T(x) - \frac{\partial }{\partial \eta} \psi(\eta) \) and \( L''(x;\eta) = - \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\) is constant for fixed \(\eta\), so
\[ I(\eta) = \frac{\partial^2}{\partial \eta \partial \eta^T} \psi(\eta)\]
Sufficiency and Completeness
Theorem 8. Suppose a family of distribution \(\mathcal{F} = \{ P_\theta, \theta \in \Theta\} \) belongs to a k-parameter Exponential family and that the "true" parameter space \(\Theta\) has a nonempty interior, then the family \(\mathcal{F}\) is complete.
Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family \(\mathcal{F}\), with a parameter space \(\Theta\) that has a nonempty interior, the natural sufficient statistic of the family \(T(X)\) and any ancillary statistic \(S(X)\) are independently distributed under each \(\theta \in \Theta\).
MLE of exponential family
Recall, \(L(x,\theta) = \log p_\theta(x) \doteq \langle \theta, T(x) \rangle - \psi(\theta) \). The solution of the MLE satisfies
\[ S(\theta) = \left. \frac{\partial}{\partial \theta} L(x;\theta) \right\vert_{\theta = \theta_{ML}}= 0 \; \Longleftrightarrow \; T(x) = E_{\theta_{ML}} [ T(X) ] \] where \( \frac{\partial}{\partial \theta} \psi(\theta) = E_\theta [ T(X) ] \)
The second derivative gives us
\[ \frac{\partial^2}{ \partial \theta \partial \theta^T} L(x;\theta) = - I(\theta) = - Cov_\theta [ T(X) ] \] The right hand side is negative definite for full rank family. Therefore the log likelihood function is strictly concave in \(\theta\).
Existence of conjugate prior
For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family. The marginalization to \(p(x) = \int p(x|\theta) p(\theta) d\theta \) is also tractable.
From Casella-Berger.
Note that the parameter space is the "natural" parameter space.
Tuesday, September 06, 2016
Local convergence for exponential mixture family
From Redner, Walker 1984
Theorem 5.2. Suppose that the Fisher information matrix \(I(\Phi)\) is positive definite at the true parameter \(\Phi^*\) and that \(\Phi^* = (\alpha_1^*, \dotsc, \alpha_m^*, \phi_1^*, \dotsc, \phi_m^*)\) is such that \(\alpha_i^* > 0 \text{ for } i = 1,\dotsc,m\). For \(\Phi^{(0)} \in \Omega\), denote by \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) the sequence in \(\Omega\) generated by the EM iteration. Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution \(\Phi^N = (\alpha_1^N, \dotsc, \alpha_m^N, \phi_1^N, \dotsc, \phi_m^N)\) of the likelihood equations is well defined and there is a certain norm on \(\Omega\) in which \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) converges linearly to \(\Phi^N\) whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^N\), i.e. there is a constant \( 0 \leq \lambda < 1\), for which
\[ \lVert \Phi^{(j+1)} - \Phi^N \rVert \leq \lambda \lVert \Phi^{(j)} - \Phi^N \rVert, \quad j = 0,1,2,\dotsc \] whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^{N}\).
Theorem 5.2. Suppose that the Fisher information matrix \(I(\Phi)\) is positive definite at the true parameter \(\Phi^*\) and that \(\Phi^* = (\alpha_1^*, \dotsc, \alpha_m^*, \phi_1^*, \dotsc, \phi_m^*)\) is such that \(\alpha_i^* > 0 \text{ for } i = 1,\dotsc,m\). For \(\Phi^{(0)} \in \Omega\), denote by \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) the sequence in \(\Omega\) generated by the EM iteration. Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution \(\Phi^N = (\alpha_1^N, \dotsc, \alpha_m^N, \phi_1^N, \dotsc, \phi_m^N)\) of the likelihood equations is well defined and there is a certain norm on \(\Omega\) in which \(\{\Phi^{(j)}\}_{j=0,1,2,\dotsc}\) converges linearly to \(\Phi^N\) whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^N\), i.e. there is a constant \( 0 \leq \lambda < 1\), for which
\[ \lVert \Phi^{(j+1)} - \Phi^N \rVert \leq \lambda \lVert \Phi^{(j)} - \Phi^N \rVert, \quad j = 0,1,2,\dotsc \] whenever \(\Phi^{(0)}\) is sufficiently near \(\Phi^{N}\).
Differentiability of jump functions
Let
\[ j_n(x) = \left\lbrace \begin{matrix}{} 0 & \text{if } x < x_n, \\ \theta_n & \text{if } x = x_n, \\ 1 & \text{if } x > x_n , \end{matrix} \right. \] For some \(0\leq \theta_n \leq 1\), then the jump function is defined as
\[ J(x) = \sum_{n=1}^\infty \alpha_n j_n(x).\] with \(\sum_{n=1}^\infty \alpha_n < \infty\).
Theorem. If \(J\) is the jump function, then \(J'(x)\) exists and vanishes almost everywhere. (non-zero in a set of measure zero, \( E = \{x : J'(x)\neq 0, x\in \mathcal{B} \}, m(E) = 0\) ).
Typical a probability distribution \(F\) is defined as a nondecreasing, right continuous function with \(F(-\infty) = 0,\; F(\infty)=1\).
\[ j_n(x) = \left\lbrace \begin{matrix}{} 0 & \text{if } x < x_n, \\ \theta_n & \text{if } x = x_n, \\ 1 & \text{if } x > x_n , \end{matrix} \right. \] For some \(0\leq \theta_n \leq 1\), then the jump function is defined as
\[ J(x) = \sum_{n=1}^\infty \alpha_n j_n(x).\] with \(\sum_{n=1}^\infty \alpha_n < \infty\).
Theorem. If \(J\) is the jump function, then \(J'(x)\) exists and vanishes almost everywhere. (non-zero in a set of measure zero, \( E = \{x : J'(x)\neq 0, x\in \mathcal{B} \}, m(E) = 0\) ).
Typical a probability distribution \(F\) is defined as a nondecreasing, right continuous function with \(F(-\infty) = 0,\; F(\infty)=1\).
Monday, August 29, 2016
Properties of Linear and Matrix Operators
Define the adjoint \(A^*\) of operator \(A\) such that
\[ \DeclareMathOperator{\rank}{rank} \langle y, Ax \rangle = \langle A^*y, x \rangle \]
We have the properties
For matrix operators, dimension of the column space is equal to the dimension of the row space
\[ \DeclareMathOperator{\rank}{rank} \langle y, Ax \rangle = \langle A^*y, x \rangle \]
We have the properties
- \(\mathcal{N}(A) = \mathcal{N}(A^*A)\) and \(\mathcal{R}(A^*) = \mathcal{R}(A^*A)\)
- \(\mathcal{N}(A^*) = \mathcal{N}(AA^*)\) and \(\mathcal{R}(A) = \mathcal{R}(AA^*)\)
And noting that \(\dim \mathcal{R}(A) = \dim \mathcal{R}(A^*)\), we have
- \(\rank(A^*A) = \rank ( AA^*) = \rank(A) = \rank(A^*) \)
For matrix operators, dimension of the column space is equal to the dimension of the row space
- column space: \(\dim (\mathcal{R}(A)) = r\)
- row space: \(\dim (\mathcal{R}(A^H)) = r\)
- Nullspace: \(\dim (\mathcal{N}(A)) = n -r\)
- Left nullspace: \(\dim (\mathcal{N}(A^H))= m-r\)
Characterization of matrix \(AB\)
For matrices A and B such that AB exists
- \(\mathcal{N}(B) \subset \mathcal{N}(AB)\)
- \(\mathcal{R}(AB) \subset \mathcal{R}(A)\)
- \(\mathcal{N}(A^*) \subset \mathcal{N}((AB)^*)\)
- \(\mathcal{R}((AB)^*) \subset \mathcal{R}(B^*)\)
From 2 and 4
\[ \rank(AB) \leq \rank(A), \quad \rank (AB) \leq \rank(B) \]
Thursday, August 25, 2016
Topology and Continuity concepts
Let \(S\) be a subset of a metric space \(M\)
- \(S\) is closed if it contains all its limits.
- \(S\) is open if for each \(p\in S\) there exists an \(r>0\) such that the open ball \(B(p,r)\) is entirely contained in \(S\)
- The complement of an open set is closed and vice versa.
The topology of \(M\) is the collection \(\mathcal{T}\) of all open subsets of \(M\).
\(\mathcal{T}\) has the following properties
- It is closed under arbitrary union of open sets
- It is closed under finite intersections
- \(\emptyset, M\) are open sets.
Corollary
- arbitrary intersection of closed sets is closed
- finite union of closed sets is closed
- \(\emptyset, M\) are closed sets.
A metric space \(M\) is complete if each Cauchy sequence in \(M\) converges to a limit in \(M\).
- \(\mathbb{R}^n\) is complete
Every compact set is closed and bounded
Continuity of function \(f: M \rightarrow N\)
Continuity of function \(f: M \rightarrow N\)
- The pre-image of each open set in \(N\) is open in \(M\)
- Preserves convergence sequences under the transformation, i.e.
\[ f( \lim x_n) = \lim f(x_n)\] for every convergent sequence \(\{x_n\}\)
Wednesday, August 24, 2016
Wednesday, August 17, 2016
Continuous mapping theorem
Continuous mapping theorem on Wiki
where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.
where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.
Friday, August 12, 2016
Kalman filter
Define the system
\[ x_{k+1} = F_k x_k + G_k w_k + \Gamma_k u_k \quad (1) \\
z_k = H_k' x_k + v_k \quad (2)\] \(\{u_k\}\) is known, \(x_0 \sim (\bar{x}_0, P_0) \) and \( \{w_k\}, \{v_k\} \) are random sequences with
\[ \begin{bmatrix} w_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
\begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \right ) \] with \( [w_k' \; v_k']' \) independent of other vectors indexed by \(l \neq k\) and \(x_0\)
One step predictor estimate
First we seek a recursive equation for \[ \hat{x}_{k|k-1} = E[x_k | Z^{k-1}] = E[x_k | \tilde{Z}^{k-1}] \] Define \(\tilde{x}_k = x_k - \hat{x}_{k|k-1}\), note that \(\{\tilde{x}_k\}\) is not an innovations sequence. Because of the independence of the innovations we have
\[ E[x_{k+1}| \tilde{Z}^k] = E[x_{k+1} | \tilde{z}_k] + E[x_{k+1}| \tilde{Z}^{k-1}] - \bar{x}_{k+1} \]
Where \( \bar{x}_k = E[x_k]\). Recall
\[ \DeclareMathOperator{\cov}{cov} E[x_{k+1} | \tilde{z}_k] = \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_k) \cov^{-1}(\tilde{z}_k, \tilde{z}_k) \tilde{z}_k \] Define the error covariance matrix \( \Sigma_{k|k-1} = E[\tilde{x}_k \tilde{x}_k' ] \) Then
\[ \begin{align*} \cov(x_{k+1}, \tilde{z}_k) &= \cov(F_k x_k + G_k w_k + \Gamma_k u_k, H_k' \tilde{x}_k + v_k) \\ &= E[ (F_k x_k + G_k w_k - F_k \bar{x}_k) (\tilde{x}_k' H_k + v_k') ] \\ &= E[F_k x_k \tilde{x}_k' H_k ] + G_k S_k \\ &= F_k [ E(\hat{x}_{k|k-1} \tilde{x}_k') + E(\tilde{x}_k \tilde{x}_k')] H_k + G_k S_k \\ &= F_k \Sigma_{k|k-1} H_k + G_k S_k
\end{align*}
\] Observe that \( \hat{z}_{k|k-1} = H' \hat{x}_{k|k-1} \) and subtracting from (2) gives \( \tilde{z}_k = H_k' \tilde{x}_k + v_k \). Also note that \( E[\hat{x}_k \tilde{x}_k'] = 0\). Next
\[ \begin{align*} \cov(\tilde{z}_k,\tilde{z}_k) &= \cov ( H_k' \tilde{x}_k + v_k, H_k' \tilde{x}_k + v_k) \\ &= H_k' \Sigma_{k|k-1} H_k + R_k = \Omega_k \end{align*} \] We also have
\[ \begin{align*} E[x_{k+1} | \tilde{Z}_{k-1}] &= E[F_k x_k + G_k w_k + \Gamma_k u_k | \tilde{Z}_{k-1}] \\ &= F_k E[x_k | \tilde{Z}_{k=1} ] + \Gamma_k u_k \\ &= F_k \hat{x}_{k|k-1} + \Gamma_k u_k \end{align*} \] Collecting all terms above, the recursion becomes
\[ \hat{x}_{k+1|k} = F_k \hat{x}_{k|k-1} + \Gamma_k u_k + K_k (z_k - H_k' \hat{x}_{k|k-1}) \quad (9) \] with \(K_k = (F_k \Sigma_{k|k-1} H_k + G_k S_k ) \Omega_k^{-1} \)
The recursion of the error covariance is developed next. From (1),(9), using the identity \(\tilde{x}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \) and expanding \(z_k\) using (2).
\[ \tilde{x}_{k+1} = (F_k - K_k H_k') \tilde{x}_k + G_k w_k - K_k v_k \] Since \(\tilde{x}_k\) and \( [w_k' v_k']' \) are independent and zero mean, we get
\[ \begin{multline*} E[\tilde{x}_{k+1} \tilde{x}_{k+1}'] = (F_k - K_k H_k') E(\tilde{x}_k \tilde{x}_k') ( F_k - K_k H_k')' \\ \times \begin{bmatrix} G_k & -K_k \end{bmatrix} \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \begin{bmatrix} G_k' \\ -K_k' \end{bmatrix} \end{multline*} \] or
\[\begin{multline*} \Sigma_{k+1|k} = (F_k - K_k H_k') \Sigma_{k|k-1} (F_k - K_k H_k')' + G_k Q_k G_k' + K_k R_k K_k' \\ - G_k S_k K_k' - K_k S_k' G_k' \end{multline*} \]
Filtered estimates
Defined in terms of \( \hat{x}_{k+1|k}\) and \( z_{k+1}\)
\[ \begin{align*} \hat{x}_{k+1|k+1} &= E[x_{k+1} | \tilde{Z}_{k+1}] \\ &= E[x_{k+1}|\tilde{z}_{k+1}] + E[x_{k+1}| \tilde{Z}_{k}] - \bar{x}_{k+1} \\ &= \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_{k+1}) \cov^{-1} (\tilde{z}_{k+1}, \tilde{z}_{k+1}) \tilde{z}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1} \end{align*} \]
Now \[ \begin{align*} \cov(x_{k+1}, \tilde{z}_{k+1}) &= E[ (\tilde{x}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1}) (\tilde{x}_{k+1} H_{k+1} + v_{k+1}) ] \\ &= E[ \tilde{x}_{k+1} \tilde{x}_{k+1}'] H_{k+1} \\ &= \Sigma_{k+1|k} H_{k+1} \end{align*} \]
From early results, we have \[ \cov(\tilde{z}_{k+1}, \tilde{z}_{k+1}) = H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} = \Omega_{k+1}\] The measurement-update (filtered estimate) is
\[ \hat{x}_{k+1|k+1} = \hat{x}_{k+1|k} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} (z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \quad (6) \]
Define the uncorrelated input noise \( \tilde{w}_k = w_k - \hat{w}_k = w_k - S_k R_k^{-1} v_k\) such that
\[ \begin{bmatrix} \tilde{w}_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
\begin{bmatrix} Q_k - S_k R_k^{-1}S_k' & 0 \\ 0 & R_k \end{bmatrix} \right ) \]
then we have
\[ \begin{align*} x_{k+1} &= F_k x_k + G_k \tilde{w}_k + G_k S_k R_k^{-1} v_k + \Gamma_k u_k \\ &= (F_k - G_k S_k R_k^{-1} H_k') x_k + G_k \tilde{w}_k + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \end{align*} \] using the fact \(v_k = z_k - H_k' x_k\) .Noting that \( E[\tilde{w}_k v_k'] = 0 \), the time update equation becomes
\[ \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \quad (5) \]
Error covariance for filtered estimates
The error covariance is
\[ \Sigma_{k|k} = E[ (x_k - \hat{x}_{k|k}) (x_k - \hat{x}_{k|k})'] \]
From (6) we have
\[ (x_{k+1} - \hat{x}_{k+1|k+1}) + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \tilde{z}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \]
By the orthogonality principle, \(x_{k+1} - \hat{x}_{k+1|k+1} \) is orthogonal to \(\tilde{z}_{k+1}\). Therefore,
\[ \Sigma_{k+1|k+1} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} = \Sigma_{k+1|k} \] or
\[ \Sigma_{k+1|k+1} = \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \]
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
\[ x_{k+1} - \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') (x_k - \hat{x}_{k|k}) + G_w \tilde{w}_k \] and using the orthogonality of \(\tilde{w}_k\) and \(x_k - \hat{x}_{k|k}\), we obtain
\[ \begin{multline*} \Sigma_{k+1|k} = (F_k - G_k S_k R_k ^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' \\ + G_k(Q_k - S_k R_k^{-1} S_k') G_k \end{multline*} \]
Summary
Measurement update
\[\begin{align*} \hat{x}_{k+1|k+1} &= \hat{x}_{k+1|k} H_{k+1}' \Omega_{k+1}^{-1} ( z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \\ \Sigma_{k+1|k+1} &= \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} \end{align*} \]
Time update
\[ \begin{align*} \hat{x}_{k+1|k} &= ( F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \\ \Sigma_{k+1|k} &= (F_k - G_k S_k R_k^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' + G_k (Q_k - S_k R_k^{-1} S_k') G_k' \end{align*} \]
Time update with \(S_k = 0\)
\[ \begin{align*} \hat{x}_{k+1|k} &= F_k \hat{x}_{k|k} + \Gamma_k u_k \\ \Sigma_{k+1|k} &= F_k \Sigma_{k|k} F_k' + G_k Q_k G_k' \end{align*} \]
Combined update with \(S_k = 0\) for filtered state:
\[ \begin{align*} \hat{x}_{k+1|k+1} &= F_k \hat{x}_{k|k} + L_{k+1} ( z_{k+1} - H_{k+1}' F_k \hat{x}_{k|k} - H_{k+1}' \Gamma_k u_k) \\ L_{k+1} &= \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1}\end{align*} \]
\[ x_{k+1} = F_k x_k + G_k w_k + \Gamma_k u_k \quad (1) \\
z_k = H_k' x_k + v_k \quad (2)\] \(\{u_k\}\) is known, \(x_0 \sim (\bar{x}_0, P_0) \) and \( \{w_k\}, \{v_k\} \) are random sequences with
\[ \begin{bmatrix} w_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
\begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \right ) \] with \( [w_k' \; v_k']' \) independent of other vectors indexed by \(l \neq k\) and \(x_0\)
One step predictor estimate
First we seek a recursive equation for \[ \hat{x}_{k|k-1} = E[x_k | Z^{k-1}] = E[x_k | \tilde{Z}^{k-1}] \] Define \(\tilde{x}_k = x_k - \hat{x}_{k|k-1}\), note that \(\{\tilde{x}_k\}\) is not an innovations sequence. Because of the independence of the innovations we have
\[ E[x_{k+1}| \tilde{Z}^k] = E[x_{k+1} | \tilde{z}_k] + E[x_{k+1}| \tilde{Z}^{k-1}] - \bar{x}_{k+1} \]
Where \( \bar{x}_k = E[x_k]\). Recall
\[ \DeclareMathOperator{\cov}{cov} E[x_{k+1} | \tilde{z}_k] = \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_k) \cov^{-1}(\tilde{z}_k, \tilde{z}_k) \tilde{z}_k \] Define the error covariance matrix \( \Sigma_{k|k-1} = E[\tilde{x}_k \tilde{x}_k' ] \) Then
\[ \begin{align*} \cov(x_{k+1}, \tilde{z}_k) &= \cov(F_k x_k + G_k w_k + \Gamma_k u_k, H_k' \tilde{x}_k + v_k) \\ &= E[ (F_k x_k + G_k w_k - F_k \bar{x}_k) (\tilde{x}_k' H_k + v_k') ] \\ &= E[F_k x_k \tilde{x}_k' H_k ] + G_k S_k \\ &= F_k [ E(\hat{x}_{k|k-1} \tilde{x}_k') + E(\tilde{x}_k \tilde{x}_k')] H_k + G_k S_k \\ &= F_k \Sigma_{k|k-1} H_k + G_k S_k
\end{align*}
\] Observe that \( \hat{z}_{k|k-1} = H' \hat{x}_{k|k-1} \) and subtracting from (2) gives \( \tilde{z}_k = H_k' \tilde{x}_k + v_k \). Also note that \( E[\hat{x}_k \tilde{x}_k'] = 0\). Next
\[ \begin{align*} \cov(\tilde{z}_k,\tilde{z}_k) &= \cov ( H_k' \tilde{x}_k + v_k, H_k' \tilde{x}_k + v_k) \\ &= H_k' \Sigma_{k|k-1} H_k + R_k = \Omega_k \end{align*} \] We also have
\[ \begin{align*} E[x_{k+1} | \tilde{Z}_{k-1}] &= E[F_k x_k + G_k w_k + \Gamma_k u_k | \tilde{Z}_{k-1}] \\ &= F_k E[x_k | \tilde{Z}_{k=1} ] + \Gamma_k u_k \\ &= F_k \hat{x}_{k|k-1} + \Gamma_k u_k \end{align*} \] Collecting all terms above, the recursion becomes
\[ \hat{x}_{k+1|k} = F_k \hat{x}_{k|k-1} + \Gamma_k u_k + K_k (z_k - H_k' \hat{x}_{k|k-1}) \quad (9) \] with \(K_k = (F_k \Sigma_{k|k-1} H_k + G_k S_k ) \Omega_k^{-1} \)
The recursion of the error covariance is developed next. From (1),(9), using the identity \(\tilde{x}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \) and expanding \(z_k\) using (2).
\[ \tilde{x}_{k+1} = (F_k - K_k H_k') \tilde{x}_k + G_k w_k - K_k v_k \] Since \(\tilde{x}_k\) and \( [w_k' v_k']' \) are independent and zero mean, we get
\[ \begin{multline*} E[\tilde{x}_{k+1} \tilde{x}_{k+1}'] = (F_k - K_k H_k') E(\tilde{x}_k \tilde{x}_k') ( F_k - K_k H_k')' \\ \times \begin{bmatrix} G_k & -K_k \end{bmatrix} \begin{bmatrix} Q_k & S_k \\ S_k' & R_k \end{bmatrix} \begin{bmatrix} G_k' \\ -K_k' \end{bmatrix} \end{multline*} \] or
\[\begin{multline*} \Sigma_{k+1|k} = (F_k - K_k H_k') \Sigma_{k|k-1} (F_k - K_k H_k')' + G_k Q_k G_k' + K_k R_k K_k' \\ - G_k S_k K_k' - K_k S_k' G_k' \end{multline*} \]
Filtered estimates
Defined in terms of \( \hat{x}_{k+1|k}\) and \( z_{k+1}\)
\[ \begin{align*} \hat{x}_{k+1|k+1} &= E[x_{k+1} | \tilde{Z}_{k+1}] \\ &= E[x_{k+1}|\tilde{z}_{k+1}] + E[x_{k+1}| \tilde{Z}_{k}] - \bar{x}_{k+1} \\ &= \bar{x}_{k+1} + \cov(x_{k+1}, \tilde{z}_{k+1}) \cov^{-1} (\tilde{z}_{k+1}, \tilde{z}_{k+1}) \tilde{z}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1} \end{align*} \]
Now \[ \begin{align*} \cov(x_{k+1}, \tilde{z}_{k+1}) &= E[ (\tilde{x}_{k+1} + \hat{x}_{k+1|k} - \bar{x}_{k+1}) (\tilde{x}_{k+1} H_{k+1} + v_{k+1}) ] \\ &= E[ \tilde{x}_{k+1} \tilde{x}_{k+1}'] H_{k+1} \\ &= \Sigma_{k+1|k} H_{k+1} \end{align*} \]
From early results, we have \[ \cov(\tilde{z}_{k+1}, \tilde{z}_{k+1}) = H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} = \Omega_{k+1}\] The measurement-update (filtered estimate) is
\[ \hat{x}_{k+1|k+1} = \hat{x}_{k+1|k} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} (z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \quad (6) \]
Define the uncorrelated input noise \( \tilde{w}_k = w_k - \hat{w}_k = w_k - S_k R_k^{-1} v_k\) such that
\[ \begin{bmatrix} \tilde{w}_k \\ v_k \end{bmatrix} \sim
\left ( \begin{bmatrix} 0 \\ 0 \end{bmatrix},
\begin{bmatrix} Q_k - S_k R_k^{-1}S_k' & 0 \\ 0 & R_k \end{bmatrix} \right ) \]
then we have
\[ \begin{align*} x_{k+1} &= F_k x_k + G_k \tilde{w}_k + G_k S_k R_k^{-1} v_k + \Gamma_k u_k \\ &= (F_k - G_k S_k R_k^{-1} H_k') x_k + G_k \tilde{w}_k + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \end{align*} \] using the fact \(v_k = z_k - H_k' x_k\) .Noting that \( E[\tilde{w}_k v_k'] = 0 \), the time update equation becomes
\[ \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \quad (5) \]
Error covariance for filtered estimates
The error covariance is
\[ \Sigma_{k|k} = E[ (x_k - \hat{x}_{k|k}) (x_k - \hat{x}_{k|k})'] \]
From (6) we have
\[ (x_{k+1} - \hat{x}_{k+1|k+1}) + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \tilde{z}_{k+1} = x_{k+1} - \hat{x}_{k+1|k} \]
By the orthogonality principle, \(x_{k+1} - \hat{x}_{k+1|k+1} \) is orthogonal to \(\tilde{z}_{k+1}\). Therefore,
\[ \Sigma_{k+1|k+1} + \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} = \Sigma_{k+1|k} \] or
\[ \Sigma_{k+1|k+1} = \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \]
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
\[ x_{k+1} - \hat{x}_{k+1|k} = (F_k - G_k S_k R_k^{-1} H_k') (x_k - \hat{x}_{k|k}) + G_w \tilde{w}_k \] and using the orthogonality of \(\tilde{w}_k\) and \(x_k - \hat{x}_{k|k}\), we obtain
\[ \begin{multline*} \Sigma_{k+1|k} = (F_k - G_k S_k R_k ^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' \\ + G_k(Q_k - S_k R_k^{-1} S_k') G_k \end{multline*} \]
Summary
Measurement update
\[\begin{align*} \hat{x}_{k+1|k+1} &= \hat{x}_{k+1|k} H_{k+1}' \Omega_{k+1}^{-1} ( z_{k+1} - H_{k+1}' \hat{x}_{k+1|k}) \\ \Sigma_{k+1|k+1} &= \Sigma_{k+1|k} - \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} H_{k+1}' \Sigma_{k+1|k} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1} \end{align*} \]
Time update
\[ \begin{align*} \hat{x}_{k+1|k} &= ( F_k - G_k S_k R_k^{-1} H_k') \hat{x}_{k|k} + \Gamma_k u_k + G_k S_k R_k^{-1} z_k \\ \Sigma_{k+1|k} &= (F_k - G_k S_k R_k^{-1} H_k') \Sigma_{k|k} (F_k - G_k S_k R_k^{-1} H_k')' + G_k (Q_k - S_k R_k^{-1} S_k') G_k' \end{align*} \]
Time update with \(S_k = 0\)
\[ \begin{align*} \hat{x}_{k+1|k} &= F_k \hat{x}_{k|k} + \Gamma_k u_k \\ \Sigma_{k+1|k} &= F_k \Sigma_{k|k} F_k' + G_k Q_k G_k' \end{align*} \]
Combined update with \(S_k = 0\) for filtered state:
\[ \begin{align*} \hat{x}_{k+1|k+1} &= F_k \hat{x}_{k|k} + L_{k+1} ( z_{k+1} - H_{k+1}' F_k \hat{x}_{k|k} - H_{k+1}' \Gamma_k u_k) \\ L_{k+1} &= \Sigma_{k+1|k} H_{k+1} \Omega_{k+1}^{-1} \\ \Omega_{k+1} &= H_{k+1}' \Sigma_{k+1|k} H_{k+1} + R_{k+1}\end{align*} \]
Wednesday, August 10, 2016
Innovations sequence
Definition
Suppose \( \{z_k\} \) is a sequence of jointly Gaussian random elements. The innovations process \(\{\tilde{z}_k\} \) is such that \(\tilde{z}_k\) consists of that part of \(z_k\) containing new information not carried in \(z_{k-1}, z_{k-2}, \dotsc\).
\[ \tilde{z}_k = z_k - E[z_k | z_0, \dotsc, z_{k-1} ] = z_k - E[z_k | Z^{k-1}] \] with \( \tilde{z}_0 = z_0 - E[z_0] \).
Properties
Suppose \( \{z_k\} \) is a sequence of jointly Gaussian random elements. The innovations process \(\{\tilde{z}_k\} \) is such that \(\tilde{z}_k\) consists of that part of \(z_k\) containing new information not carried in \(z_{k-1}, z_{k-2}, \dotsc\).
\[ \tilde{z}_k = z_k - E[z_k | z_0, \dotsc, z_{k-1} ] = z_k - E[z_k | Z^{k-1}] \] with \( \tilde{z}_0 = z_0 - E[z_0] \).
Properties
- \(\tilde{z}_k\) independent of \( z_0, \dotsc, z_{k-1}\) by definition
- (1) implies \(E[ \tilde{z}_k' \tilde{z}_l] = 0, l \neq k \)
- \(E[z_k | Z^{k-1}]\) is a linear combination of \(z_0, \dotsc, z_{k-1}\)
- The sequence \(\{\tilde{z}_k\} \) can be obtained from \(\{z_k\} \) by a causal linear operation.
- The sequence \(\{z_k\} \) can be reconstructed from \(\{\tilde{z}_k\} \) by a causal linear operation.
- (4) and (5) implies \( E[z_k | Z^{k-1}] = E[z_k | \tilde{Z}^{k-1}] \) or more generally \( E[w | Z^{k-1}] = E[w | \tilde{Z}^{k-1}] \) for jointly Gaussian \(w, \{z_k\} \)
- For zero mean Gaussian \(\tilde{x}_k\), \(\tilde{z}_k\), we have \[ E[x_k|Z^{k-1}] = E[x_k|\tilde{Z}^{k-1}] = E[x_k| \tilde{z}_0] + \dotsb + E[x_k| \tilde{z}_{k-1}] \]
Friday, August 05, 2016
Properties of the exponential family distributions
Given exponential family \( \mathcal{P}=\{p_\theta(x) | \theta \in \Theta \} \), where
\[ p_\theta(x) = h(x) \exp ( q^T(\theta) T(x) - b(\theta) ) I_{supp}(x), \quad Z = \exp(- b(\theta)) \]
Regular family (gives you completeness)
Conditions for regularity,
Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter \(\mathbf{\theta}=(\theta_1,\dotsc,\theta_r)\) is less than the dimension of the natural statistic \(\mathbf{T}(\mathbf{x}) \) is called a curved family.
Identifiability of parameter vector \( \mathbf{\theta} \).
When statistic is minimal, then it is a matter of ensuring \(q: \Theta \mapsto \mathcal{Q} \) defines a 1-1 mapping from desired parameter space to natural parameter space.
\[ p_\theta(x) = h(x) \exp ( q^T(\theta) T(x) - b(\theta) ) I_{supp}(x), \quad Z = \exp(- b(\theta)) \]
Regular family (gives you completeness)
Conditions for regularity,
- support \(p_\theta(x)\) independent of \(\theta\)
- finite partition function \(Z(\theta) < \infty,\; \forall \theta\)
- Interior of parameter space is solid, \( \mathring{\Theta} \neq \emptyset \),
- Interior of natural parameter space is solid \( \mathring{\mathcal{Q}} \neq \emptyset \)
- Statistic vector function and the constant function are linearly independent. i.e. \( [1, T_1(x),\dotsc,T_K(x)] \) linear indep. (gives you minimal statistic)
- twice differentiable \( p_\theta(x) \)
Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter \(\mathbf{\theta}=(\theta_1,\dotsc,\theta_r)\) is less than the dimension of the natural statistic \(\mathbf{T}(\mathbf{x}) \) is called a curved family.
Identifiability of parameter vector \( \mathbf{\theta} \).
When statistic is minimal, then it is a matter of ensuring \(q: \Theta \mapsto \mathcal{Q} \) defines a 1-1 mapping from desired parameter space to natural parameter space.
Thursday, July 21, 2016
Invariance and carry over properties of MLE
Review: Asymptotic properties of MLE
The MLE of the parameter \(\alpha = g(\theta)\), where the PDF \(p(x;\theta)\) is paremeterized by \(\theta\), is given by
\[ \hat{\alpha} = g(\hat{\theta})\] where \(\hat{\theta}\) is the MLE of \(\theta\).
Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.
If \(g(\theta)\) is continuous in \(\theta\), the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator \(g(\hat{\theta})\)
However, biasedness of the estimator \(g(\hat{\theta})\) depends on the convexity of \(g\) and does not carry over from \(\hat{\theta}\).
Other properties of MLE
- Asymptotically efficient (attains CRLB as \(N\rightarrow\infty\))
- Asymptotically Gaussian (asymptotically normality)
- Asymptotically Unbiased
- Consistent (weakly and strongly)
The MLE of the parameter \(\alpha = g(\theta)\), where the PDF \(p(x;\theta)\) is paremeterized by \(\theta\), is given by
\[ \hat{\alpha} = g(\hat{\theta})\] where \(\hat{\theta}\) is the MLE of \(\theta\).
Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.
If \(g(\theta)\) is continuous in \(\theta\), the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator \(g(\hat{\theta})\)
However, biasedness of the estimator \(g(\hat{\theta})\) depends on the convexity of \(g\) and does not carry over from \(\hat{\theta}\).
Other properties of MLE
- If an efficient estimator exists, the ML method will produce it.
- Unlike the MVU estimator, MLE can be biased
- Note: CRLB applies to unbiased estimators, so when estimator is biased, it is possible it has variance smaller than \(I^{-1}(\theta)\)
Thursday, July 14, 2016
Properties of a regular family of parameterized distribution
A family of parameterized distribution defined by
As a result of the above we have (Kay's definition of regular)
\begin{align*} 0 &= \frac{\partial }{ \partial \theta }E_\theta \{1\} \\
&= \frac{\partial }{ \partial \theta }\int p_\theta(y) dy \\
&\overset{1}{=} \int \frac{\partial }{ \partial \theta } p_\theta(y) dy \\
&\overset{2,4}{=} \int p_\theta(y) \frac{\partial }{ \partial \theta } \log p_\theta(y) dy \\
&= E_\theta \{ S_\theta(y) \}
\end{align*}
\[ \mathcal{P} = \{ p_\theta(y) | \theta \in \Theta \subset \mathbb{R}^P \}\]
is regular if it satisfies the following conditions
- Support of \(p_\theta(y)\) does not depend on \(\theta\) for all \(\theta \in \Theta\)
- \(\frac{\partial}{\partial \theta} p_\theta(y) \) exists
- Optional \( \frac{\partial^2}{\partial \theta^2} p_\theta(y) \) exists
Note \[ \frac{\partial }{ \partial \theta } \ln p_\theta(y) = \frac{1}{p_\theta(y) } \frac{\partial }{ \partial \theta } p_\theta(y) \quad \quad (4) \]
Define the score function (log := natural log)
\[ S_\theta (y) := \nabla_\theta \log p_\theta(y) \]
Note also
\[ E_\theta \{ 1 \} = 1 = \int_\mathcal{Y} p_\theta(y) dy \]
\begin{align*} 0 &= \frac{\partial }{ \partial \theta }E_\theta \{1\} \\
&= \frac{\partial }{ \partial \theta }\int p_\theta(y) dy \\
&\overset{1}{=} \int \frac{\partial }{ \partial \theta } p_\theta(y) dy \\
&\overset{2,4}{=} \int p_\theta(y) \frac{\partial }{ \partial \theta } \log p_\theta(y) dy \\
&= E_\theta \{ S_\theta(y) \}
\end{align*}
Friday, July 01, 2016
Spectral Theorem for Diagonalizable Matrices
It occurs to me that most presentation of the spectrum theorem only concerns orthonormal basis. This is a more general result from Meyer.
Theorem
Theorem
A matrix \( \mathbf{A} \in \mathbb{R}^{n\times x}\) with spectrum \(\sigma(\mathbf{A}) = \{ \lambda_1, \dotsc, \lambda_k \} \) is diagonalizable if and only if there exist matrices \(\{ \mathbf{G}_1, \dotsc, \mathbf{G}_k\} \) such that \[ \mathbf{A} = \lambda_1 \mathbf{G}_1 + \dotsb + \lambda_k \mathbf{G}_k \] where the \(\mathbf{G}_i\)'s have the following propertiesThe expansion is known as the spectral decomposition of \(\mathbf{A}\), and the \(\mathbf{G}_i\)'s are called the spectral projectors associated with \(\mathbf{A}\).
- \(\mathbf{G}_i\) is the projector onto \(\mathcal{N} (\mathbf{A} - \lambda_i \mathbf{I}) \) along \(\mathcal{R} ( \mathbf{A} - \lambda_i \mathbf{I} ) \).
- \(\mathbf{G}_i\mathbf{G}_j = 0 \) whenever \( i \neq j \)
- \( \mathbf{G}_1 + \dotsb + \mathbf{G}_k = 1\)
Note that being a projector \(\mathbf{G}_i\) is idempotent.
- \(\mathbf{G}_i = \mathbf{G}_i^2\)
And since \(\mathcal{N}(\mathbf{G}_i) = \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \) and \(\mathcal{R}(\mathbf{G}_i) = \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \), we have the following equivalent complimentary subspaces
- \( \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \oplus \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \)
- \( \mathcal{R}(\mathbf{G}_i) \oplus \mathcal{N}(\mathbf{A} - \lambda_i \mathbf{I} ) \)
- \( \mathcal{R}(\mathbf{A} - \lambda_i \mathbf{I} ) \oplus \mathcal{N}(\mathbf{G}_i) \)
- \( \mathcal{R}(\mathbf{G}_i) \oplus \mathcal{N}(\mathbf{G}_i) \)
Friday, June 24, 2016
Monday, June 06, 2016
Majorization and Schur-convexity
Majorization
A real vector \( b = (b_1,\dotsc,b_n) \) is said to majorize \( a = ( a_1,\dotsc, a_n) \), denote \( a \succ b\ \) if
A real vector \( b = (b_1,\dotsc,b_n) \) is said to majorize \( a = ( a_1,\dotsc, a_n) \), denote \( a \succ b\ \) if
- \( \sum_{i=1}^n a_i = \sum_{i=1}^n b_i \), and
- \( \sum_{i=k}^n a_{(i)} \leq \sum_{i=k}^n b_{(i)} \), \( k = 2,\dotsc,n \)
where \( a_{(1)} \leq \dotsb \leq a_{(n)} \), \( b_{(1)} \leq \dotsb \leq b_{(n)} \) are \(a\) and \(b\) arranged in increasing order.
A function \( \phi(a) \) symmetric in the coordinates of \( a = ( a_1, \dotsc, a_n ) \) is said to be Schur-concave if \( a \succ b \) implies \( \phi(a) \ge \phi(b) \).
A function \( \phi(a) \) is Schur-convex if \( -\phi(a) \) is Schur-concave.
Tuesday, May 03, 2016
Requirements for good parenting
- Keep my child's emotional love tank full - speak the five love languages.
- physical touch
- words of affirmation
- quality time
- gifts
- acts of service
- Use the most positive ways I can to control my child's behavior: requests, gentle physical manipulation, commands, punishment, and behavior modification.
- Lovingly discipline my child. Ask, "What does this child need?" and then go about it logically.
- Do my best to handle my own anger appropriately and not dump it on my child. Be kind but firm.
- Do my best to train my child to handle anger maturely - the goal is sixteen and a half years.
From the five love languages of children
Thursday, April 28, 2016
GCC auto-vectorization
auto-vectorization presentation
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Great article on GCC auto vectorization detection
http://locklessinc.com/articles/vectorize/
https://software.intel.com/en-us/articles/comparison-of-gcc-481-and-icc-140-update-1-auto-vectorization-capabilities
http://stackoverflow.com/questions/30305830/understanding-gcc-4-9-2-auto-vectorization-output
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Great article on GCC auto vectorization detection
http://locklessinc.com/articles/vectorize/
https://software.intel.com/en-us/articles/comparison-of-gcc-481-and-icc-140-update-1-auto-vectorization-capabilities
http://stackoverflow.com/questions/30305830/understanding-gcc-4-9-2-auto-vectorization-output
Block encryption on Linux/Windows
Bitlocker on Windows
http://www.howtogeek.com/193013/how-to-create-an-encrypted-container-file-with-bitlocker-on-windows/
http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview
dm-crypt/cryptsetup on Linux
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
https://gitlab.com/cryptsetup/cryptsetup/
http://www.howtogeek.com/193013/how-to-create-an-encrypted-container-file-with-bitlocker-on-windows/
http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview
dm-crypt/cryptsetup on Linux
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
https://gitlab.com/cryptsetup/cryptsetup/
Monday, April 11, 2016
RAII (Resource acquisition is initialization) definition
https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
Thursday, January 14, 2016
UL/DL network duality for SINR metric
See Boche and Schubert's "A General Duality Theory for Uplink and Downlink Beamforming"
Tuesday, January 12, 2016
Notes on Recursive Least Squares (RLS)
Method of Least Squares
- Assuming a multiple linear regression model, the method attempts to choose the tap weights to minimize the sum of error squares.
- When the error process is white and zero mean, the least-squares estimate is the best linear unbiased estimate (BLUE)
- When the error process is white Gaussian zero mean, the least-squares estimate achieves the Cramer-Rao lower bound (CRLB) for unbiased estimates, hence a minimum-variance unbiased estimate (MVUE)
Recursive Least Squares
- Allows one to update the tap weights as the input becomes available.
- Can incorporate additional constraints such as weighted error squares or a regularizing term, [commonly applied due to the ill-posed nature of the problem].
- The inversion of the correlation matrix is replaced by a simple scalar division.
- Initial correlation matrix provide a mean to specify regularization.
- The fundamental difference between RLS and LMS:
- The step-size parameter \(\mu\) in LMS is replaced by \(\mathbf{\Phi}^{-1}(n)\), the inverse of the correlation matrix of the input \(\mathbf{u}(n)\), which has the effect of whitening the inputs.
- The rate of convergence of RLS is invariant to the eigenvalue spread of the ensemble average input correlation matrix \(\mathbf{R}\)
- The excessive mean-square error converges to zero if stationary environment is assumed and the exponential weight factor is set to \(\lambda=1\).
Subscribe to:
Posts (Atom)