- F is right continous;
- F is monotone non-decreasing,
- F has limits at ±∞
F(∞):=limx↑∞F(x)=1F(−∞):=limx↓−∞F(x)=0
A collection of random thoughts and materials that might prove enlightening to me and my friends.
MathJax
Monday, December 12, 2016
Definition of a distribution function
A function F:R↦[0,1] satisfying the following properties is a distribution function.
Wednesday, October 19, 2016
VMWare disk activity reduction methods
See this forum link
and also vmware knowledge base
Also try disabling swap in Linux guest (sudo swapoff -a)
and also vmware knowledge base
Also try disabling swap in Linux guest (sudo swapoff -a)
Friday, September 16, 2016
Enabling copy/paste in vmware 12 player
1) sudo apt-get autoremove open-vm-tools
2) Install VMware Tools by following the usual method (Virtual Machine --> Reinstall VMWare Tools)
3) Reboot the VM
4) sudo apt-get install open-vm-tools-desktop
5) Reboot the VM, after the reboot copy/paste and drag/drop will work!
Monday, September 12, 2016
Properties of MMSE and MAP estimator (Bayesian)
The MMSE estimator is the mean of the posterior pdf E(x|y) of x given observation y.
- The estimator is unbiased.
- The covariance is reduced compared to the a priori information.
- Commutes over affine transformation.
- Additivity property for independent data sets.
- Linear in the Gaussian case.
- The estimator error is orthogonal to the space spanned by all Y-measurable functions (affine functions being a subset)
The MAP estimator arg maxθp(θ|x) given observation x
- Jointly Gaussian case, MAP = MMSE (posterior is Gaussian, hence pdf unimodal and symmetric, mean = mode = median)
- Do not commute over nonlinear transformation. (Invariant property does not hold, unlike ML)
- Commutes over linear transformation.
MAP tends to ML when
- Prior is uninformative
- Large amount of information in data compared to prior
Gaussian linear model
Let the observed samples takes on the model
x=Hθ+w with prior N(μθ,Cθ) and noise vector N(0,Cw) independent of θ, then the posterior is Gaussian with mean
E(θ|x)=μθ+CθHT(HCθHT+Cw)−1(x−Hμθ) and covariance Cθ|x=Cθ−CθHT(HCθHT+Cw)−1HCθ Contrary to the classical Gaussian linear model H does not need to be full rank.
In alternative form,
E(θ|x)=μθ+(C−1θ+HTC−1wH)−1HTC−1w(x−Hμθ) and Cθ|x=(C−1θ+HTC−1wH)−1
LMMSE estimator E∗[X|Y]
- A function of first and second order statistics only. E∗[X|Y]=μx+ΣxyΣ−1yy(y−μy) (inverse can be replaced with pseudo-inverse if necessary)
- Jointly Gaussian case, E∗[X|Y]=E[X|Y]
- Error orthogonal to subspace spanned by Y
- Additivity property E∗[X|Y1,…,Yk]=k∑j=1E∗[X|Yj]−(k−1)μx
Properties of the exponential family of distributions
From Dasgupta (see link)
One parameter Exponential family
Given the family of distribution {Pθ,θ∈Θ⊂R}, the pdf of which has the form
f(x|θ)=h(x)eη(θ)T(x)−ψ∗(θ)
If η(θ) is a 1-1 function of θ we can drop θ in the discussion. Thus the family of distributions {Pη,η∈Ξ⊂R} is in canonical form.
f(x|θ)=h(x)eηT(x)−ψ(η) and define the set
T={η:eψ(η)<∞}
η is the natural parameter, and T the natural parameter space.
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if Ξ=T, regular if T is open.
[Brown] Let K be the convex support of the measure ν
The family is minimal if dimΞ=dimK=k
It is nonsingular if Varη(T(X))>0 for all η∈∘T, the interior of T.
Theorem 1. ψ(η) is a convex function on T.
Theorem 2. ψ(η) is a cumulant generating function for any η∈∘T.
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...
Multi-parameter Exponential family
Given the family of distribution {Pθ,θ∈Θ⊂Rk}, the pdf of which has the form
f(x|θ)=h(x)e∑ki=1ηi(θ)Ti(x)−ψ∗(θ) is the k-parameter Exponential family.
Where we reparametrize using ηi=ηi(θ), we have the k-parameter canonical family.
The assumption here is that the dimension of Θ and dimension of the image of Θ under the map (θ)→(η1(θ),…,ηk(θ)) are equal to k.
The canonical form is
f(x|θ)=h(x)e∑ki=1ηiTi(x)−ψ(η)
Theorem 7. Given a sample having a distribution Pη,η∈T in the canonical k-parameter Exponential family. with T={η∈Rk:eψ(η)<∞}
ψ(η)) the partial derivatives of any order exists for any η∈∘T
Definition. The family is full rank if at every η∈∘T the covariance matrix I(η)=∂2∂ηi∂ηjψ(η)≥0 is nonsingular.
Definition/Theorem. If the family is nonsingular, then the matrix I(η) is called the Fisher information matrix at η (for the natural parameter).
Proof. For canonical exponential family, we have L(x,η)=logpη(x)≐⟨η,T(x)⟩−ψ(η), L′(x;η)=T(x)−∂∂ηψ(η) and L″(x;η)=−∂2∂η∂ηTψ(η) is constant for fixed η, so
I(η)=∂2∂η∂ηTψ(η)
Sufficiency and Completeness
Theorem 8. Suppose a family of distribution F={Pθ,θ∈Θ} belongs to a k-parameter Exponential family and that the "true" parameter space Θ has a nonempty interior, then the family F is complete.
Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family F, with a parameter space Θ that has a nonempty interior, the natural sufficient statistic of the family T(X) and any ancillary statistic S(X) are independently distributed under each θ∈Θ.
MLE of exponential family
Recall, L(x,θ)=logpθ(x)≐⟨θ,T(x)⟩−ψ(θ). The solution of the MLE satisfies
S(θ)=∂∂θL(x;θ)|θ=θML=0⟺T(x)=EθML[T(X)] where ∂∂θψ(θ)=Eθ[T(X)]
The second derivative gives us
∂2∂θ∂θTL(x;θ)=−I(θ)=−Covθ[T(X)] The right hand side is negative definite for full rank family. Therefore the log likelihood function is strictly concave in θ.
Existence of conjugate prior
For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family. The marginalization to p(x)=∫p(x|θ)p(θ)dθ is also tractable.
From Casella-Berger.
Note that the parameter space is the "natural" parameter space.
Tuesday, September 06, 2016
Local convergence for exponential mixture family
From Redner, Walker 1984
Theorem 5.2. Suppose that the Fisher information matrix I(Φ) is positive definite at the true parameter Φ∗ and that Φ∗=(α∗1,…,α∗m,ϕ∗1,…,ϕ∗m) is such that α∗i>0 for i=1,…,m. For Φ(0)∈Ω, denote by {Φ(j)}j=0,1,2,… the sequence in Ω generated by the EM iteration. Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution ΦN=(αN1,…,αNm,ϕN1,…,ϕNm) of the likelihood equations is well defined and there is a certain norm on Ω in which {Φ(j)}j=0,1,2,… converges linearly to ΦN whenever Φ(0) is sufficiently near ΦN, i.e. there is a constant 0≤λ<1, for which
‖Φ(j+1)−ΦN‖≤λ‖Φ(j)−ΦN‖,j=0,1,2,… whenever Φ(0) is sufficiently near ΦN.
Theorem 5.2. Suppose that the Fisher information matrix I(Φ) is positive definite at the true parameter Φ∗ and that Φ∗=(α∗1,…,α∗m,ϕ∗1,…,ϕ∗m) is such that α∗i>0 for i=1,…,m. For Φ(0)∈Ω, denote by {Φ(j)}j=0,1,2,… the sequence in Ω generated by the EM iteration. Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution ΦN=(αN1,…,αNm,ϕN1,…,ϕNm) of the likelihood equations is well defined and there is a certain norm on Ω in which {Φ(j)}j=0,1,2,… converges linearly to ΦN whenever Φ(0) is sufficiently near ΦN, i.e. there is a constant 0≤λ<1, for which
‖Φ(j+1)−ΦN‖≤λ‖Φ(j)−ΦN‖,j=0,1,2,… whenever Φ(0) is sufficiently near ΦN.
Differentiability of jump functions
Let
jn(x)={0if x<xn,θnif x=xn,1if x>xn, For some 0≤θn≤1, then the jump function is defined as
J(x)=∞∑n=1αnjn(x). with ∑∞n=1αn<∞.
Theorem. If J is the jump function, then J′(x) exists and vanishes almost everywhere. (non-zero in a set of measure zero, E={x:J′(x)≠0,x∈B},m(E)=0 ).
Typical a probability distribution F is defined as a nondecreasing, right continuous function with F(−∞)=0,F(∞)=1.
jn(x)={0if x<xn,θnif x=xn,1if x>xn, For some 0≤θn≤1, then the jump function is defined as
J(x)=∞∑n=1αnjn(x). with ∑∞n=1αn<∞.
Theorem. If J is the jump function, then J′(x) exists and vanishes almost everywhere. (non-zero in a set of measure zero, E={x:J′(x)≠0,x∈B},m(E)=0 ).
Typical a probability distribution F is defined as a nondecreasing, right continuous function with F(−∞)=0,F(∞)=1.
Monday, August 29, 2016
Properties of Linear and Matrix Operators
Define the adjoint A∗ of operator A such that
⟨y,Ax⟩=⟨A∗y,x⟩
We have the properties
For matrix operators, dimension of the column space is equal to the dimension of the row space
⟨y,Ax⟩=⟨A∗y,x⟩
We have the properties
- N(A)=N(A∗A) and R(A∗)=R(A∗A)
- N(A∗)=N(AA∗) and R(A)=R(AA∗)
And noting that dimR(A)=dimR(A∗), we have
- rank(A∗A)=rank(AA∗)=rank(A)=rank(A∗)
For matrix operators, dimension of the column space is equal to the dimension of the row space
- column space: dim(R(A))=r
- row space: dim(R(AH))=r
- Nullspace: dim(N(A))=n−r
- Left nullspace: dim(N(AH))=m−r
Characterization of matrix AB
For matrices A and B such that AB exists
- N(B)⊂N(AB)
- R(AB)⊂R(A)
- N(A∗)⊂N((AB)∗)
- R((AB)∗)⊂R(B∗)
From 2 and 4
rank(AB)≤rank(A),rank(AB)≤rank(B)
Thursday, August 25, 2016
Topology and Continuity concepts
Let S be a subset of a metric space M
- S is closed if it contains all its limits.
- S is open if for each p∈S there exists an r>0 such that the open ball B(p,r) is entirely contained in S
- The complement of an open set is closed and vice versa.
The topology of M is the collection T of all open subsets of M.
T has the following properties
- It is closed under arbitrary union of open sets
- It is closed under finite intersections
- ∅,M are open sets.
Corollary
- arbitrary intersection of closed sets is closed
- finite union of closed sets is closed
- ∅,M are closed sets.
A metric space M is complete if each Cauchy sequence in M converges to a limit in M.
- Rn is complete
Every compact set is closed and bounded
Continuity of function f:M→N
Continuity of function f:M→N
- The pre-image of each open set in N is open in M
- Preserves convergence sequences under the transformation, i.e.
f(limxn)=limf(xn) for every convergent sequence {xn}
Wednesday, August 24, 2016
Wednesday, August 17, 2016
Continuous mapping theorem
Continuous mapping theorem on Wiki
where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.
where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.
Friday, August 12, 2016
Kalman filter
Define the system
xk+1=Fkxk+Gkwk+Γkuk(1)zk=H′kxk+vk(2) {uk} is known, x0∼(ˉx0,P0) and {wk},{vk} are random sequences with
[wkvk]∼([00],[QkSkS′kRk]) with [w′kv′k]′ independent of other vectors indexed by l≠k and x0
One step predictor estimate
First we seek a recursive equation for ˆxk|k−1=E[xk|Zk−1]=E[xk|˜Zk−1] Define ˜xk=xk−ˆxk|k−1, note that {˜xk} is not an innovations sequence. Because of the independence of the innovations we have
E[xk+1|˜Zk]=E[xk+1|˜zk]+E[xk+1|˜Zk−1]−ˉxk+1
Where ˉxk=E[xk]. Recall
E[xk+1|˜zk]=ˉxk+1+cov(xk+1,˜zk)cov−1(˜zk,˜zk)˜zk Define the error covariance matrix Σk|k−1=E[˜xk˜x′k] Then
cov(xk+1,˜zk)=cov(Fkxk+Gkwk+Γkuk,H′k˜xk+vk)=E[(Fkxk+Gkwk−Fkˉxk)(˜x′kHk+v′k)]=E[Fkxk˜x′kHk]+GkSk=Fk[E(ˆxk|k−1˜x′k)+E(˜xk˜x′k)]Hk+GkSk=FkΣk|k−1Hk+GkSk Observe that ˆzk|k−1=H′ˆxk|k−1 and subtracting from (2) gives ˜zk=H′k˜xk+vk. Also note that E[ˆxk˜x′k]=0. Next
cov(˜zk,˜zk)=cov(H′k˜xk+vk,H′k˜xk+vk)=H′kΣk|k−1Hk+Rk=Ωk We also have
E[xk+1|˜Zk−1]=E[Fkxk+Gkwk+Γkuk|˜Zk−1]=FkE[xk|˜Zk=1]+Γkuk=Fkˆxk|k−1+Γkuk Collecting all terms above, the recursion becomes
ˆxk+1|k=Fkˆxk|k−1+Γkuk+Kk(zk−H′kˆxk|k−1)(9) with Kk=(FkΣk|k−1Hk+GkSk)Ω−1k
The recursion of the error covariance is developed next. From (1),(9), using the identity ˜xk+1=xk+1−ˆxk+1|k and expanding zk using (2).
˜xk+1=(Fk−KkH′k)˜xk+Gkwk−Kkvk Since ˜xk and [w′kv′k]′ are independent and zero mean, we get
E[˜xk+1˜x′k+1]=(Fk−KkH′k)E(˜xk˜x′k)(Fk−KkH′k)′×[Gk−Kk][QkSkS′kRk][G′k−K′k] or
Σk+1|k=(Fk−KkH′k)Σk|k−1(Fk−KkH′k)′+GkQkG′k+KkRkK′k−GkSkK′k−KkS′kG′k
Filtered estimates
Defined in terms of ˆxk+1|k and zk+1
ˆxk+1|k+1=E[xk+1|˜Zk+1]=E[xk+1|˜zk+1]+E[xk+1|˜Zk]−ˉxk+1=ˉxk+1+cov(xk+1,˜zk+1)cov−1(˜zk+1,˜zk+1)˜zk+1+ˆxk+1|k−ˉxk+1
Now cov(xk+1,˜zk+1)=E[(˜xk+1+ˆxk+1|k−ˉxk+1)(˜xk+1Hk+1+vk+1)]=E[˜xk+1˜x′k+1]Hk+1=Σk+1|kHk+1
From early results, we have cov(˜zk+1,˜zk+1)=H′k+1Σk+1|kHk+1+Rk+1=Ωk+1 The measurement-update (filtered estimate) is
ˆxk+1|k+1=ˆxk+1|k+Σk+1|kHk+1Ω−1k+1(zk+1−H′k+1ˆxk+1|k)(6)
Define the uncorrelated input noise ˜wk=wk−ˆwk=wk−SkR−1kvk such that
[˜wkvk]∼([00],[Qk−SkR−1kS′k00Rk])
then we have
xk+1=Fkxk+Gk˜wk+GkSkR−1kvk+Γkuk=(Fk−GkSkR−1kH′k)xk+Gk˜wk+Γkuk+GkSkR−1kzk using the fact vk=zk−H′kxk .Noting that E[˜wkv′k]=0, the time update equation becomes
ˆxk+1|k=(Fk−GkSkR−1kH′k)ˆxk|k+Γkuk+GkSkR−1kzk(5)
Error covariance for filtered estimates
The error covariance is
Σk|k=E[(xk−ˆxk|k)(xk−ˆxk|k)′]
From (6) we have
(xk+1−ˆxk+1|k+1)+Σk+1|kHk+1Ω−1k+1˜zk+1=xk+1−ˆxk+1|k
By the orthogonality principle, xk+1−ˆxk+1|k+1 is orthogonal to ˜zk+1. Therefore,
Σk+1|k+1+Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|k=Σk+1|k or
Σk+1|k+1=Σk+1|k−Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|k
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
xk+1−ˆxk+1|k=(Fk−GkSkR−1kH′k)(xk−ˆxk|k)+Gw˜wk and using the orthogonality of ˜wk and xk−ˆxk|k, we obtain
Σk+1|k=(Fk−GkSkR−1kH′k)Σk|k(Fk−GkSkR−1kH′k)′+Gk(Qk−SkR−1kS′k)Gk
Summary
Measurement update
ˆxk+1|k+1=ˆxk+1|kH′k+1Ω−1k+1(zk+1−H′k+1ˆxk+1|k)Σk+1|k+1=Σk+1|k−Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|kΩk+1=H′k+1Σk+1|kHk+1+Rk+1
Time update
ˆxk+1|k=(Fk−GkSkR−1kH′k)ˆxk|k+Γkuk+GkSkR−1kzkΣk+1|k=(Fk−GkSkR−1kH′k)Σk|k(Fk−GkSkR−1kH′k)′+Gk(Qk−SkR−1kS′k)G′k
Time update with Sk=0
ˆxk+1|k=Fkˆxk|k+ΓkukΣk+1|k=FkΣk|kF′k+GkQkG′k
Combined update with Sk=0 for filtered state:
ˆxk+1|k+1=Fkˆxk|k+Lk+1(zk+1−H′k+1Fkˆxk|k−H′k+1Γkuk)Lk+1=Σk+1|kHk+1Ω−1k+1Ωk+1=H′k+1Σk+1|kHk+1+Rk+1
xk+1=Fkxk+Gkwk+Γkuk(1)zk=H′kxk+vk(2) {uk} is known, x0∼(ˉx0,P0) and {wk},{vk} are random sequences with
[wkvk]∼([00],[QkSkS′kRk]) with [w′kv′k]′ independent of other vectors indexed by l≠k and x0
One step predictor estimate
First we seek a recursive equation for ˆxk|k−1=E[xk|Zk−1]=E[xk|˜Zk−1] Define ˜xk=xk−ˆxk|k−1, note that {˜xk} is not an innovations sequence. Because of the independence of the innovations we have
E[xk+1|˜Zk]=E[xk+1|˜zk]+E[xk+1|˜Zk−1]−ˉxk+1
Where ˉxk=E[xk]. Recall
E[xk+1|˜zk]=ˉxk+1+cov(xk+1,˜zk)cov−1(˜zk,˜zk)˜zk Define the error covariance matrix Σk|k−1=E[˜xk˜x′k] Then
cov(xk+1,˜zk)=cov(Fkxk+Gkwk+Γkuk,H′k˜xk+vk)=E[(Fkxk+Gkwk−Fkˉxk)(˜x′kHk+v′k)]=E[Fkxk˜x′kHk]+GkSk=Fk[E(ˆxk|k−1˜x′k)+E(˜xk˜x′k)]Hk+GkSk=FkΣk|k−1Hk+GkSk Observe that ˆzk|k−1=H′ˆxk|k−1 and subtracting from (2) gives ˜zk=H′k˜xk+vk. Also note that E[ˆxk˜x′k]=0. Next
cov(˜zk,˜zk)=cov(H′k˜xk+vk,H′k˜xk+vk)=H′kΣk|k−1Hk+Rk=Ωk We also have
E[xk+1|˜Zk−1]=E[Fkxk+Gkwk+Γkuk|˜Zk−1]=FkE[xk|˜Zk=1]+Γkuk=Fkˆxk|k−1+Γkuk Collecting all terms above, the recursion becomes
ˆxk+1|k=Fkˆxk|k−1+Γkuk+Kk(zk−H′kˆxk|k−1)(9) with Kk=(FkΣk|k−1Hk+GkSk)Ω−1k
The recursion of the error covariance is developed next. From (1),(9), using the identity ˜xk+1=xk+1−ˆxk+1|k and expanding zk using (2).
˜xk+1=(Fk−KkH′k)˜xk+Gkwk−Kkvk Since ˜xk and [w′kv′k]′ are independent and zero mean, we get
E[˜xk+1˜x′k+1]=(Fk−KkH′k)E(˜xk˜x′k)(Fk−KkH′k)′×[Gk−Kk][QkSkS′kRk][G′k−K′k] or
Σk+1|k=(Fk−KkH′k)Σk|k−1(Fk−KkH′k)′+GkQkG′k+KkRkK′k−GkSkK′k−KkS′kG′k
Filtered estimates
Defined in terms of ˆxk+1|k and zk+1
ˆxk+1|k+1=E[xk+1|˜Zk+1]=E[xk+1|˜zk+1]+E[xk+1|˜Zk]−ˉxk+1=ˉxk+1+cov(xk+1,˜zk+1)cov−1(˜zk+1,˜zk+1)˜zk+1+ˆxk+1|k−ˉxk+1
Now cov(xk+1,˜zk+1)=E[(˜xk+1+ˆxk+1|k−ˉxk+1)(˜xk+1Hk+1+vk+1)]=E[˜xk+1˜x′k+1]Hk+1=Σk+1|kHk+1
From early results, we have cov(˜zk+1,˜zk+1)=H′k+1Σk+1|kHk+1+Rk+1=Ωk+1 The measurement-update (filtered estimate) is
ˆxk+1|k+1=ˆxk+1|k+Σk+1|kHk+1Ω−1k+1(zk+1−H′k+1ˆxk+1|k)(6)
Define the uncorrelated input noise ˜wk=wk−ˆwk=wk−SkR−1kvk such that
[˜wkvk]∼([00],[Qk−SkR−1kS′k00Rk])
then we have
xk+1=Fkxk+Gk˜wk+GkSkR−1kvk+Γkuk=(Fk−GkSkR−1kH′k)xk+Gk˜wk+Γkuk+GkSkR−1kzk using the fact vk=zk−H′kxk .Noting that E[˜wkv′k]=0, the time update equation becomes
ˆxk+1|k=(Fk−GkSkR−1kH′k)ˆxk|k+Γkuk+GkSkR−1kzk(5)
Error covariance for filtered estimates
The error covariance is
Σk|k=E[(xk−ˆxk|k)(xk−ˆxk|k)′]
From (6) we have
(xk+1−ˆxk+1|k+1)+Σk+1|kHk+1Ω−1k+1˜zk+1=xk+1−ˆxk+1|k
By the orthogonality principle, xk+1−ˆxk+1|k+1 is orthogonal to ˜zk+1. Therefore,
Σk+1|k+1+Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|k=Σk+1|k or
Σk+1|k+1=Σk+1|k−Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|k
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
xk+1−ˆxk+1|k=(Fk−GkSkR−1kH′k)(xk−ˆxk|k)+Gw˜wk and using the orthogonality of ˜wk and xk−ˆxk|k, we obtain
Σk+1|k=(Fk−GkSkR−1kH′k)Σk|k(Fk−GkSkR−1kH′k)′+Gk(Qk−SkR−1kS′k)Gk
Summary
Measurement update
ˆxk+1|k+1=ˆxk+1|kH′k+1Ω−1k+1(zk+1−H′k+1ˆxk+1|k)Σk+1|k+1=Σk+1|k−Σk+1|kHk+1Ω−1k+1H′k+1Σk+1|kΩk+1=H′k+1Σk+1|kHk+1+Rk+1
Time update
ˆxk+1|k=(Fk−GkSkR−1kH′k)ˆxk|k+Γkuk+GkSkR−1kzkΣk+1|k=(Fk−GkSkR−1kH′k)Σk|k(Fk−GkSkR−1kH′k)′+Gk(Qk−SkR−1kS′k)G′k
Time update with Sk=0
ˆxk+1|k=Fkˆxk|k+ΓkukΣk+1|k=FkΣk|kF′k+GkQkG′k
Combined update with Sk=0 for filtered state:
ˆxk+1|k+1=Fkˆxk|k+Lk+1(zk+1−H′k+1Fkˆxk|k−H′k+1Γkuk)Lk+1=Σk+1|kHk+1Ω−1k+1Ωk+1=H′k+1Σk+1|kHk+1+Rk+1
Wednesday, August 10, 2016
Innovations sequence
Definition
Suppose {zk} is a sequence of jointly Gaussian random elements. The innovations process {˜zk} is such that ˜zk consists of that part of zk containing new information not carried in zk−1,zk−2,….
˜zk=zk−E[zk|z0,…,zk−1]=zk−E[zk|Zk−1] with ˜z0=z0−E[z0].
Properties
Suppose {zk} is a sequence of jointly Gaussian random elements. The innovations process {˜zk} is such that ˜zk consists of that part of zk containing new information not carried in zk−1,zk−2,….
˜zk=zk−E[zk|z0,…,zk−1]=zk−E[zk|Zk−1] with ˜z0=z0−E[z0].
Properties
- ˜zk independent of z0,…,zk−1 by definition
- (1) implies E[˜z′k˜zl]=0,l≠k
- E[zk|Zk−1] is a linear combination of z0,…,zk−1
- The sequence {˜zk} can be obtained from {zk} by a causal linear operation.
- The sequence {zk} can be reconstructed from {˜zk} by a causal linear operation.
- (4) and (5) implies E[zk|Zk−1]=E[zk|˜Zk−1] or more generally E[w|Zk−1]=E[w|˜Zk−1] for jointly Gaussian w,{zk}
- For zero mean Gaussian ˜xk, ˜zk, we have E[xk|Zk−1]=E[xk|˜Zk−1]=E[xk|˜z0]+⋯+E[xk|˜zk−1]
Friday, August 05, 2016
Properties of the exponential family distributions
Given exponential family P={pθ(x)|θ∈Θ}, where
pθ(x)=h(x)exp(qT(θ)T(x)−b(θ))Isupp(x),Z=exp(−b(θ))
Regular family (gives you completeness)
Conditions for regularity,
Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter θ=(θ1,…,θr) is less than the dimension of the natural statistic T(x) is called a curved family.
Identifiability of parameter vector θ.
When statistic is minimal, then it is a matter of ensuring q:Θ↦Q defines a 1-1 mapping from desired parameter space to natural parameter space.
pθ(x)=h(x)exp(qT(θ)T(x)−b(θ))Isupp(x),Z=exp(−b(θ))
Regular family (gives you completeness)
Conditions for regularity,
- support pθ(x) independent of θ
- finite partition function Z(θ)<∞,∀θ
- Interior of parameter space is solid, ˚Θ≠∅,
- Interior of natural parameter space is solid ˚Q≠∅
- Statistic vector function and the constant function are linearly independent. i.e. [1,T1(x),…,TK(x)] linear indep. (gives you minimal statistic)
- twice differentiable pθ(x)
Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter θ=(θ1,…,θr) is less than the dimension of the natural statistic T(x) is called a curved family.
Identifiability of parameter vector θ.
When statistic is minimal, then it is a matter of ensuring q:Θ↦Q defines a 1-1 mapping from desired parameter space to natural parameter space.
Thursday, July 21, 2016
Invariance and carry over properties of MLE
Review: Asymptotic properties of MLE
The MLE of the parameter α=g(θ), where the PDF p(x;θ) is paremeterized by θ, is given by
ˆα=g(ˆθ) where ˆθ is the MLE of θ.
Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.
If g(θ) is continuous in θ, the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator g(ˆθ)
However, biasedness of the estimator g(ˆθ) depends on the convexity of g and does not carry over from ˆθ.
Other properties of MLE
- Asymptotically efficient (attains CRLB as N→∞)
- Asymptotically Gaussian (asymptotically normality)
- Asymptotically Unbiased
- Consistent (weakly and strongly)
The MLE of the parameter α=g(θ), where the PDF p(x;θ) is paremeterized by θ, is given by
ˆα=g(ˆθ) where ˆθ is the MLE of θ.
Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.
If g(θ) is continuous in θ, the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator g(ˆθ)
However, biasedness of the estimator g(ˆθ) depends on the convexity of g and does not carry over from ˆθ.
Other properties of MLE
- If an efficient estimator exists, the ML method will produce it.
- Unlike the MVU estimator, MLE can be biased
- Note: CRLB applies to unbiased estimators, so when estimator is biased, it is possible it has variance smaller than I−1(θ)
Thursday, July 14, 2016
Properties of a regular family of parameterized distribution
A family of parameterized distribution defined by
As a result of the above we have (Kay's definition of regular)
0=∂∂θEθ{1}=∂∂θ∫pθ(y)dy1=∫∂∂θpθ(y)dy2,4=∫pθ(y)∂∂θlogpθ(y)dy=Eθ{Sθ(y)}
P={pθ(y)|θ∈Θ⊂RP}
is regular if it satisfies the following conditions
- Support of pθ(y) does not depend on θ for all θ∈Θ
- ∂∂θpθ(y) exists
- Optional ∂2∂θ2pθ(y) exists
Note ∂∂θlnpθ(y)=1pθ(y)∂∂θpθ(y)(4)
Define the score function (log := natural log)
Sθ(y):=∇θlogpθ(y)
Note also
Eθ{1}=1=∫Ypθ(y)dy
0=∂∂θEθ{1}=∂∂θ∫pθ(y)dy1=∫∂∂θpθ(y)dy2,4=∫pθ(y)∂∂θlogpθ(y)dy=Eθ{Sθ(y)}
Friday, July 01, 2016
Spectral Theorem for Diagonalizable Matrices
It occurs to me that most presentation of the spectrum theorem only concerns orthonormal basis. This is a more general result from Meyer.
Theorem
Theorem
A matrix A∈Rn×x with spectrum σ(A)={λ1,…,λk} is diagonalizable if and only if there exist matrices {G1,…,Gk} such that A=λ1G1+⋯+λkGk where the Gi's have the following propertiesThe expansion is known as the spectral decomposition of A, and the Gi's are called the spectral projectors associated with A.
- Gi is the projector onto N(A−λiI) along R(A−λiI).
- GiGj=0 whenever i≠j
- G1+⋯+Gk=1
Note that being a projector Gi is idempotent.
- Gi=G2i
And since N(Gi)=R(A−λiI) and R(Gi)=N(A−λiI), we have the following equivalent complimentary subspaces
- R(A−λiI)⊕N(A−λiI)
- R(Gi)⊕N(A−λiI)
- R(A−λiI)⊕N(Gi)
- R(Gi)⊕N(Gi)
Friday, June 24, 2016
Monday, June 06, 2016
Majorization and Schur-convexity
Majorization
A real vector b=(b1,…,bn) is said to majorize a=(a1,…,an), denote a≻b if
A real vector b=(b1,…,bn) is said to majorize a=(a1,…,an), denote a≻b if
- ∑ni=1ai=∑ni=1bi, and
- ∑ni=ka(i)≤∑ni=kb(i), k=2,…,n
where a(1)≤⋯≤a(n), b(1)≤⋯≤b(n) are a and b arranged in increasing order.
A function ϕ(a) symmetric in the coordinates of a=(a1,…,an) is said to be Schur-concave if a≻b implies ϕ(a)≥ϕ(b).
A function ϕ(a) is Schur-convex if −ϕ(a) is Schur-concave.
Tuesday, May 03, 2016
Requirements for good parenting
- Keep my child's emotional love tank full - speak the five love languages.
- physical touch
- words of affirmation
- quality time
- gifts
- acts of service
- Use the most positive ways I can to control my child's behavior: requests, gentle physical manipulation, commands, punishment, and behavior modification.
- Lovingly discipline my child. Ask, "What does this child need?" and then go about it logically.
- Do my best to handle my own anger appropriately and not dump it on my child. Be kind but firm.
- Do my best to train my child to handle anger maturely - the goal is sixteen and a half years.
From the five love languages of children
Thursday, April 28, 2016
GCC auto-vectorization
auto-vectorization presentation
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Great article on GCC auto vectorization detection
http://locklessinc.com/articles/vectorize/
https://software.intel.com/en-us/articles/comparison-of-gcc-481-and-icc-140-update-1-auto-vectorization-capabilities
http://stackoverflow.com/questions/30305830/understanding-gcc-4-9-2-auto-vectorization-output
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Great article on GCC auto vectorization detection
http://locklessinc.com/articles/vectorize/
https://software.intel.com/en-us/articles/comparison-of-gcc-481-and-icc-140-update-1-auto-vectorization-capabilities
http://stackoverflow.com/questions/30305830/understanding-gcc-4-9-2-auto-vectorization-output
Block encryption on Linux/Windows
Bitlocker on Windows
http://www.howtogeek.com/193013/how-to-create-an-encrypted-container-file-with-bitlocker-on-windows/
http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview
dm-crypt/cryptsetup on Linux
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
https://gitlab.com/cryptsetup/cryptsetup/
http://www.howtogeek.com/193013/how-to-create-an-encrypted-container-file-with-bitlocker-on-windows/
http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview
dm-crypt/cryptsetup on Linux
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
https://gitlab.com/cryptsetup/cryptsetup/
Monday, April 11, 2016
RAII (Resource acquisition is initialization) definition
https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
Thursday, January 14, 2016
UL/DL network duality for SINR metric
See Boche and Schubert's "A General Duality Theory for Uplink and Downlink Beamforming"
Tuesday, January 12, 2016
Notes on Recursive Least Squares (RLS)
Method of Least Squares
- Assuming a multiple linear regression model, the method attempts to choose the tap weights to minimize the sum of error squares.
- When the error process is white and zero mean, the least-squares estimate is the best linear unbiased estimate (BLUE)
- When the error process is white Gaussian zero mean, the least-squares estimate achieves the Cramer-Rao lower bound (CRLB) for unbiased estimates, hence a minimum-variance unbiased estimate (MVUE)
Recursive Least Squares
- Allows one to update the tap weights as the input becomes available.
- Can incorporate additional constraints such as weighted error squares or a regularizing term, [commonly applied due to the ill-posed nature of the problem].
- The inversion of the correlation matrix is replaced by a simple scalar division.
- Initial correlation matrix provide a mean to specify regularization.
- The fundamental difference between RLS and LMS:
- The step-size parameter μ in LMS is replaced by Φ−1(n), the inverse of the correlation matrix of the input u(n), which has the effect of whitening the inputs.
- The rate of convergence of RLS is invariant to the eigenvalue spread of the ensemble average input correlation matrix R
- The excessive mean-square error converges to zero if stationary environment is assumed and the exponential weight factor is set to λ=1.
Subscribe to:
Posts (Atom)