Processing math: 100%

MathJax

Monday, December 12, 2016

Definition of a distribution function

A function F:R[0,1] satisfying the following properties is a distribution function.

  1. F is right continous;
  2. F is monotone non-decreasing,
  3. F has limits at ±
    F():=limxF(x)=1F():=limxF(x)=0

Wednesday, October 19, 2016

VMWare disk activity reduction methods

See this forum link

and also vmware knowledge base

Also try disabling swap in Linux guest (sudo swapoff -a)

Friday, September 16, 2016

Enabling copy/paste in vmware 12 player

1) sudo apt-get autoremove open-vm-tools
2) Install VMware Tools by following the usual method (Virtual Machine --> Reinstall VMWare Tools)
3) Reboot the VM
4) sudo apt-get install open-vm-tools-desktop
5) Reboot the VM, after the reboot copy/paste and drag/drop will work!

Monday, September 12, 2016

Properties of MMSE and MAP estimator (Bayesian)

The MMSE estimator is the mean of the posterior pdf E(x|y) of x given observation y.
  1. The estimator is unbiased.
  2. The covariance is reduced compared to the a priori information.
  3. Commutes over affine transformation.
  4. Additivity property for independent data sets.
  5. Linear in the Gaussian case.
  6. The estimator error is orthogonal to the space spanned by all Y-measurable functions (affine functions being a subset)
The MAP estimator arg maxθp(θ|x) given observation x
  1. Jointly Gaussian case, MAP = MMSE (posterior is Gaussian, hence pdf unimodal and symmetric, mean = mode = median)
  2. Do not commute over nonlinear transformation. (Invariant property does not hold, unlike ML)
  3. Commutes over linear transformation.
MAP tends to ML when
  • Prior is uninformative
  • Large amount of information in data compared to prior

Gaussian linear model

Let the observed samples takes on the model
x=Hθ+w with prior N(μθ,Cθ) and noise vector N(0,Cw) independent of θ, then the posterior is Gaussian with mean
E(θ|x)=μθ+CθHT(HCθHT+Cw)1(xHμθ) and covariance Cθ|x=CθCθHT(HCθHT+Cw)1HCθ Contrary to the classical Gaussian linear model H does not need to be full rank.  
In alternative form, 
E(θ|x)=μθ+(C1θ+HTC1wH)1HTC1w(xHμθ) and Cθ|x=(C1θ+HTC1wH)1 

LMMSE estimator E[X|Y]
  1. A function of first and second order statistics only.  E[X|Y]=μx+ΣxyΣ1yy(yμy) (inverse can be replaced with pseudo-inverse if necessary)
  2. Jointly Gaussian case, E[X|Y]=E[X|Y]
  3. Error orthogonal to subspace spanned by Y
  4. Additivity property E[X|Y1,,Yk]=kj=1E[X|Yj](k1)μx

Properties of the exponential family of distributions

From Dasgupta (see link)

One parameter Exponential family

Given the family of distribution {Pθ,θΘR}, the pdf of which has the form
f(x|θ)=h(x)eη(θ)T(x)ψ(θ) 
If η(θ) is a 1-1 function of  θ we can drop θ in the discussion.  Thus the family of distributions {Pη,ηΞR} is in canonical form.
f(x|θ)=h(x)eηT(x)ψ(η) and define the set 
T={η:eψ(η)<}
η is the natural parameter, and T the natural parameter space
The family is called the canonical one parameter Exponential family.
[Brown] The family is called full if Ξ=Tregular if T is open.
[Brown] Let K be the convex support of the measure ν
The family is minimal if dimΞ=dimK=k
It is nonsingular if Varη(T(X))>0 for all ηT, the interior of T.

Theorem 1. ψ(η) is a convex function on T.
Theorem 2. ψ(η) is a cumulant generating function for any ηT.
Note: 1st cumulant is the expectation, 2nd,3rd are the central moments (2nd being the variance), 4th and higher order cumulants are neither moments or central moments.
There are more properties...

Multi-parameter Exponential family

Given the family of distribution {Pθ,θΘRk}, the pdf of which has the form
f(x|θ)=h(x)eki=1ηi(θ)Ti(x)ψ(θ) is the k-parameter Exponential family.
Where we reparametrize using ηi=ηi(θ), we have the k-parameter canonical family.
The assumption here is that the dimension of Θ and dimension of the image of Θ under the map (θ)(η1(θ),,ηk(θ)) are equal to k.
The canonical form is 
f(x|θ)=h(x)eki=1ηiTi(x)ψ(η)

Theorem 7.  Given a sample having a distribution Pη,ηT in the canonical k-parameter Exponential family.  with  T={ηRk:eψ(η)<}
ψ(η)) the partial derivatives of any order exists  for any ηT  

Definition.  The family is full rank if at every ηT the covariance matrix I(η)=2ηiηjψ(η)0 is nonsingular.
Definition/Theorem.  If the family is nonsingular, then the matrix I(η) is called the Fisher information matrix at η (for the natural parameter).
Proof.  For canonical exponential family, we have L(x,η)=logpη(x)η,T(x)ψ(η), L(x;η)=T(x)ηψ(η) and L(x;η)=2ηηTψ(η) is constant for fixed η, so  
I(η)=2ηηTψ(η)

Sufficiency and Completeness

Theorem 8.  Suppose a family of distribution F={Pθ,θΘ} belongs to a k-parameter Exponential family and that the "true" parameter space Θ has a nonempty interior, then the family F is complete.

Theorem 9. (Basu's Theorem for the Exponential Family) In any k-parameter Exponential family F, with a parameter space Θ that has a nonempty interior, the natural sufficient statistic of the family T(X) and any ancillary statistic S(X) are independently distributed under each θΘ.

MLE of exponential family

Recall, L(x,θ)=logpθ(x)θ,T(x)ψ(θ).  The solution of the MLE satisfies 
S(θ)=θL(x;θ)|θ=θML=0T(x)=EθML[T(X)] where θψ(θ)=Eθ[T(X)]

The second derivative gives us 
2θθTL(x;θ)=I(θ)=Covθ[T(X)] The right hand side is negative definite for full rank family.  Therefore the log likelihood function is strictly concave in θ.

Existence of conjugate prior

For likelihood functions within the exponential family, a conjugate prior can be found within the exponential family.  The marginalization to p(x)=p(x|θ)p(θ)dθ is also tractable.

From Casella-Berger.  

Note that the parameter space is the "natural" parameter space.



Tuesday, September 06, 2016

Local convergence for exponential mixture family

From Redner, Walker 1984

Theorem 5.2.  Suppose that the Fisher information matrix I(Φ) is positive definite at the true parameter Φ and that Φ=(α1,,αm,ϕ1,,ϕm) is such that αi>0 for i=1,,m.  For Φ(0)Ω, denote by {Φ(j)}j=0,1,2, the sequence in Ω generated by the EM iteration.  Then with probability 1, whenever N is sufficiently large, the unique strongly consistent solution ΦN=(αN1,,αNm,ϕN1,,ϕNm) of the likelihood equations is well defined and there is a certain norm on Ω in which  {Φ(j)}j=0,1,2, converges linearly to ΦN whenever Φ(0) is sufficiently near ΦN, i.e. there is a constant 0λ<1, for which
Φ(j+1)ΦNλΦ(j)ΦN,j=0,1,2, whenever Φ(0) is sufficiently near ΦN.

Differentiability of jump functions

Let
jn(x)={0if x<xn,θnif x=xn,1if x>xn, For some 0θn1,  then the jump function is defined as
J(x)=n=1αnjn(x). with n=1αn<.
Theorem.  If  J is the jump function, then J(x) exists and vanishes almost everywhere.  (non-zero in a set of measure zero, E={x:J(x)0,xB},m(E)=0 ).

Typical a probability distribution F is defined as a nondecreasing, right continuous function with F()=0,F()=1.

Monday, August 29, 2016

Properties of Linear and Matrix Operators

Define the adjoint A of operator A such that
y,Ax=Ay,x
We have the properties

  • N(A)=N(AA) and R(A)=R(AA)
  • N(A)=N(AA) and R(A)=R(AA)
And noting that dimR(A)=dimR(A), we have
  • rank(AA)=rank(AA)=rank(A)=rank(A)

For matrix operators, dimension of the column space is equal to the dimension of the row space
  • column space: dim(R(A))=r
  • row space: dim(R(AH))=r
  • Nullspace: dim(N(A))=nr
  • Left nullspace: dim(N(AH))=mr
Characterization of matrix AB
For matrices A and B such that AB exists
  1. N(B)N(AB)
  2. R(AB)R(A)
  3. N(A)N((AB))
  4. R((AB))R(B)
From 2 and 4
rank(AB)rank(A),rank(AB)rank(B)

Thursday, August 25, 2016

Topology and Continuity concepts

Let S be a subset of a metric space M

  • S is closed if it contains all its limits.
  • S is open if for each pS there exists an r>0 such that the open ball B(p,r) is entirely contained in S
  • The complement of an open set is closed and vice versa.
The topology of M is the collection T of all open subsets of M.

T has the following properties
  • It is closed under arbitrary union of open sets
  • It is closed under finite intersections
  • ,M are open sets.
Corollary
  • arbitrary intersection of closed sets is closed
  • finite union of closed sets is closed
  • ,M are closed sets.
A metric space M is complete if each Cauchy sequence in M converges to a limit in M.  
  • Rn is complete
Every compact set is closed and bounded

Continuity of function f:MN
  • The pre-image of each open set in N is open in M 
  • Preserves convergence sequences under the transformation, i.e.
    f(limxn)=limf(xn) for every convergent sequence {xn}

Wednesday, August 17, 2016

Continuous mapping theorem

Continuous mapping theorem on Wiki


where (i) is convergence in distribution, (ii) in probability and (iii) almost sure convergence.


Friday, August 12, 2016

Kalman filter

Define the system
xk+1=Fkxk+Gkwk+Γkuk(1)zk=Hkxk+vk(2) {uk} is known, x0(ˉx0,P0) and {wk},{vk} are random sequences with
[wkvk]([00],[QkSkSkRk])  with [wkvk] independent of other vectors indexed by lk and x0

One step predictor estimate

First we seek a recursive equation for ˆxk|k1=E[xk|Zk1]=E[xk|˜Zk1] Define ˜xk=xkˆxk|k1, note that {˜xk} is not an innovations sequence.  Because of the independence of the innovations we have
E[xk+1|˜Zk]=E[xk+1|˜zk]+E[xk+1|˜Zk1]ˉxk+1
Where ˉxk=E[xk].  Recall
E[xk+1|˜zk]=ˉxk+1+cov(xk+1,˜zk)cov1(˜zk,˜zk)˜zk Define the error covariance matrix Σk|k1=E[˜xk˜xk] Then
cov(xk+1,˜zk)=cov(Fkxk+Gkwk+Γkuk,Hk˜xk+vk)=E[(Fkxk+GkwkFkˉxk)(˜xkHk+vk)]=E[Fkxk˜xkHk]+GkSk=Fk[E(ˆxk|k1˜xk)+E(˜xk˜xk)]Hk+GkSk=FkΣk|k1Hk+GkSk Observe that ˆzk|k1=Hˆxk|k1  and subtracting from (2) gives ˜zk=Hk˜xk+vk.  Also note that E[ˆxk˜xk]=0.  Next
cov(˜zk,˜zk)=cov(Hk˜xk+vk,Hk˜xk+vk)=HkΣk|k1Hk+Rk=Ωk We also have
E[xk+1|˜Zk1]=E[Fkxk+Gkwk+Γkuk|˜Zk1]=FkE[xk|˜Zk=1]+Γkuk=Fkˆxk|k1+Γkuk Collecting all terms above, the recursion becomes
ˆxk+1|k=Fkˆxk|k1+Γkuk+Kk(zkHkˆxk|k1)(9) with Kk=(FkΣk|k1Hk+GkSk)Ω1k

The recursion of the error covariance is developed next.   From (1),(9), using the identity ˜xk+1=xk+1ˆxk+1|k and expanding zk using (2).
˜xk+1=(FkKkHk)˜xk+GkwkKkvk Since ˜xk and [wkvk] are independent and zero mean, we get
E[˜xk+1˜xk+1]=(FkKkHk)E(˜xk˜xk)(FkKkHk)×[GkKk][QkSkSkRk][GkKk] or
Σk+1|k=(FkKkHk)Σk|k1(FkKkHk)+GkQkGk+KkRkKkGkSkKkKkSkGk
Filtered estimates

Defined in terms of ˆxk+1|k and zk+1
ˆxk+1|k+1=E[xk+1|˜Zk+1]=E[xk+1|˜zk+1]+E[xk+1|˜Zk]ˉxk+1=ˉxk+1+cov(xk+1,˜zk+1)cov1(˜zk+1,˜zk+1)˜zk+1+ˆxk+1|kˉxk+1
Now cov(xk+1,˜zk+1)=E[(˜xk+1+ˆxk+1|kˉxk+1)(˜xk+1Hk+1+vk+1)]=E[˜xk+1˜xk+1]Hk+1=Σk+1|kHk+1
From early results, we have cov(˜zk+1,˜zk+1)=Hk+1Σk+1|kHk+1+Rk+1=Ωk+1  The measurement-update (filtered estimate) is
ˆxk+1|k+1=ˆxk+1|k+Σk+1|kHk+1Ω1k+1(zk+1Hk+1ˆxk+1|k)(6)
Define the uncorrelated input noise ˜wk=wkˆwk=wkSkR1kvk such that
[˜wkvk]([00],[QkSkR1kSk00Rk])
then we have
xk+1=Fkxk+Gk˜wk+GkSkR1kvk+Γkuk=(FkGkSkR1kHk)xk+Gk˜wk+Γkuk+GkSkR1kzk using the fact vk=zkHkxk .Noting that E[˜wkvk]=0,  the time update equation becomes
ˆxk+1|k=(FkGkSkR1kHk)ˆxk|k+Γkuk+GkSkR1kzk(5)
Error covariance for filtered estimates
The error covariance is
Σk|k=E[(xkˆxk|k)(xkˆxk|k)]
From (6) we have
(xk+1ˆxk+1|k+1)+Σk+1|kHk+1Ω1k+1˜zk+1=xk+1ˆxk+1|k
By the orthogonality principle, xk+1ˆxk+1|k+1 is orthogonal to ˜zk+1.  Therefore,
Σk+1|k+1+Σk+1|kHk+1Ω1k+1Hk+1Σk+1|k=Σk+1|k or
Σk+1|k+1=Σk+1|kΣk+1|kHk+1Ω1k+1Hk+1Σk+1|k
Lastly, we obtain the time-update error covariance, subtracting (5) from (1)
xk+1ˆxk+1|k=(FkGkSkR1kHk)(xkˆxk|k)+Gw˜wk and using the orthogonality of ˜wk and xkˆxk|k, we obtain
Σk+1|k=(FkGkSkR1kHk)Σk|k(FkGkSkR1kHk)+Gk(QkSkR1kSk)Gk
Summary

Measurement update
ˆxk+1|k+1=ˆxk+1|kHk+1Ω1k+1(zk+1Hk+1ˆxk+1|k)Σk+1|k+1=Σk+1|kΣk+1|kHk+1Ω1k+1Hk+1Σk+1|kΩk+1=Hk+1Σk+1|kHk+1+Rk+1
Time update
ˆxk+1|k=(FkGkSkR1kHk)ˆxk|k+Γkuk+GkSkR1kzkΣk+1|k=(FkGkSkR1kHk)Σk|k(FkGkSkR1kHk)+Gk(QkSkR1kSk)Gk
Time update with Sk=0
ˆxk+1|k=Fkˆxk|k+ΓkukΣk+1|k=FkΣk|kFk+GkQkGk
Combined update with Sk=0 for filtered state:
ˆxk+1|k+1=Fkˆxk|k+Lk+1(zk+1Hk+1Fkˆxk|kHk+1Γkuk)Lk+1=Σk+1|kHk+1Ω1k+1Ωk+1=Hk+1Σk+1|kHk+1+Rk+1

Wednesday, August 10, 2016

Innovations sequence

Definition 
Suppose {zk} is a sequence of jointly Gaussian random elements.   The innovations process {˜zk} is such that ˜zk consists of that part of zk containing new information not carried in zk1,zk2,.
˜zk=zkE[zk|z0,,zk1]=zkE[zk|Zk1] with ˜z0=z0E[z0].

Properties

  1. ˜zk independent of z0,,zk1 by definition
  2. (1) implies E[˜zk˜zl]=0,lk
  3. E[zk|Zk1] is a linear combination of z0,,zk1
  4. The sequence {˜zk} can be obtained from {zk} by a causal linear operation.
  5. The sequence {zk} can be reconstructed from {˜zk} by a causal linear operation. 
  6. (4) and (5) implies E[zk|Zk1]=E[zk|˜Zk1] or more generally  E[w|Zk1]=E[w|˜Zk1] for jointly Gaussian w,{zk}
  7. For zero mean Gaussian ˜xk˜zk, we have E[xk|Zk1]=E[xk|˜Zk1]=E[xk|˜z0]++E[xk|˜zk1]



Friday, August 05, 2016

Properties of the exponential family distributions

Given exponential family P={pθ(x)|θΘ}, where
pθ(x)=h(x)exp(qT(θ)T(x)b(θ))Isupp(x),Z=exp(b(θ))
Regular family (gives you completeness)
Conditions for regularity,

  1. support pθ(x) independent of θ
  2. finite partition function Z(θ)<,θ
  3. Interior of parameter space is solid, ˚Θ
  4. Interior of natural parameter space is solid ˚Q
  5. Statistic vector function and the constant function are linearly independent.  i.e. [1,T1(x),,TK(x)] linear indep. (gives you minimal statistic)
  6. twice differentiable pθ(x) 

Curved family (only know statistic is minimal)
An exponential family where the dimension of the vector parameter θ=(θ1,,θr) is less than the dimension of the natural statistic T(x) is called a curved family.

Identifiability of parameter vector θ.
When statistic is minimal, then it is a matter of ensuring q:ΘQ defines a 1-1 mapping from desired parameter space to natural parameter space.

Thursday, July 21, 2016

Invariance and carry over properties of MLE

Review: Asymptotic properties of MLE
  • Asymptotically efficient (attains CRLB as N)
  • Asymptotically Gaussian (asymptotically normality)
  • Asymptotically Unbiased
  • Consistent (weakly and strongly)
First, the invariance property of MLE

The MLE of the parameter α=g(θ), where the PDF p(x;θ) is paremeterized by θ, is given by
ˆα=g(ˆθ) where ˆθ is the MLE of θ.

Consistency (in class) is defined as the weak convergence of the sequence of estimates to the true parameter as N gets large.

If g(θ) is continuous in θ, the convergence properties (esp. convergence in prob.) carry over, i.e. the consistency of the estimator g(ˆθ)

However, biasedness of the estimator g(ˆθ) depends on the convexity of g and does not carry over from ˆθ.

Other properties of MLE
  • If an efficient estimator exists, the ML method will produce it.
  • Unlike the MVU estimator, MLE can be biased
  • Note: CRLB applies to unbiased estimators, so when estimator is biased, it is possible it has variance smaller than I1(θ)

Thursday, July 14, 2016

Properties of a regular family of parameterized distribution

A family of parameterized distribution defined by
P={pθ(y)|θΘRP}
is regular if it satisfies the following conditions
  1. Support of pθ(y) does not depend on θ for all θΘ
  2. θpθ(y) exists
  3. Optional 2θ2pθ(y) exists
Note θlnpθ(y)=1pθ(y)θpθ(y)(4)
Define the score function (log := natural log)
Sθ(y):=θlogpθ(y)
Note also
Eθ{1}=1=Ypθ(y)dy
As a result of the above we have (Kay's definition of regular)
0=θEθ{1}=θpθ(y)dy1=θpθ(y)dy2,4=pθ(y)θlogpθ(y)dy=Eθ{Sθ(y)}

Friday, July 01, 2016

Spectral Theorem for Diagonalizable Matrices

It occurs to me that most presentation of the spectrum theorem only concerns orthonormal basis.  This is a more general result from Meyer.

Theorem
A matrix ARn×x with spectrum σ(A)={λ1,,λk} is diagonalizable if and only if there exist matrices {G1,,Gk} such that  A=λ1G1++λkGk where the Gi's have the following properties
  • Gi is the projector onto N(AλiI) along R(AλiI)
  • GiGj=0 whenever ij
  • G1++Gk=1
The expansion is known as the spectral decomposition of A, and the Gi's are called the spectral projectors associated with A.

Note that being a projector Gi is idempotent.
  • Gi=G2i
And since N(Gi)=R(AλiI) and R(Gi)=N(AλiI), we have the following equivalent complimentary subspaces
  • R(AλiI)N(AλiI)
  • R(Gi)N(AλiI)
  • R(AλiI)N(Gi)
  • R(Gi)N(Gi)

Friday, June 24, 2016

Conference submission deadlines 2016

ICASSP 2017 - 2016/09/12
WCNC 2017 - 2016/09/30

Monday, June 06, 2016

Majorization and Schur-convexity

Majorization 

A real vector b=(b1,,bn) is said to majorize a=(a1,,an), denote ab   if

  1.  ni=1ai=ni=1bi, and
  2. ni=ka(i)ni=kb(i), k=2,,n
where a(1)a(n), b(1)b(n) are a and b arranged in increasing order. 

A function ϕ(a) symmetric in the coordinates of a=(a1,,an) is said to be Schur-concave if  ab implies ϕ(a)ϕ(b).  

A function ϕ(a) is Schur-convex if ϕ(a) is Schur-concave.

Tuesday, May 03, 2016

Requirements for good parenting


  1. Keep my child's emotional love tank full - speak the five love languages.
    • physical touch
    • words of affirmation
    • quality time
    • gifts
    • acts of service
  2. Use the most positive ways I can to control my child's behavior: requests, gentle physical manipulation, commands, punishment, and behavior modification.
  3. Lovingly discipline my child.  Ask, "What does this child need?" and then go about it logically.
  4. Do my best to handle my own anger appropriately and not dump it on my child.  Be kind but firm.
  5. Do my best to train my child to handle anger maturely - the goal is sixteen and a half years. 
From the five love languages of children

Monday, April 11, 2016

RAII (Resource acquisition is initialization) definition

https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization

Thursday, January 14, 2016

UL/DL network duality for SINR metric

See Boche and Schubert's "A General Duality Theory for Uplink and Downlink Beamforming"

Tuesday, January 12, 2016

Notes on Recursive Least Squares (RLS)


Method of Least Squares
  • Assuming a multiple linear regression model, the method attempts to choose the tap weights to minimize the sum of error squares.
  • When the error process is white and zero mean, the least-squares estimate is the best linear unbiased estimate (BLUE)
  • When the error process is white Gaussian zero mean, the least-squares estimate achieves the Cramer-Rao lower bound (CRLB) for unbiased estimates, hence a minimum-variance unbiased estimate (MVUE)
Recursive Least Squares
  • Allows one to update the tap weights as the input becomes available.
  • Can incorporate additional constraints such as weighted error squares or a regularizing term, [commonly applied due to the ill-posed nature of the problem].
  • The inversion of the correlation matrix is replaced by a simple scalar division.
  • Initial correlation matrix provide a mean to specify regularization.
  • The fundamental difference between RLS and LMS: 
    • The step-size parameter μ in LMS is replaced by Φ1(n), the inverse of the correlation matrix of the input u(n), which has the effect of whitening the inputs.
  • The rate of convergence of RLS is invariant to the eigenvalue spread of the ensemble average input correlation matrix R
  • The excessive mean-square error converges to zero if stationary environment is assumed and the exponential weight factor is set to λ=1.