Dave's Journal: January 2015

Friday, January 30, 2015

Conditional Expectation

Definition. \(\lambda\) is absolutely continuous (AC) with respect to \(\mu\), written \(\lambda \ll \mu\), if \(\mu(A)=0\) implies \(\lambda(A) = 0\).

Theorem 2 Radon-Nikodym Theorem Let \((\Omega,\mathcal{B},P)\) be the probability space. Suppose \(v\) is a positive bounded measure and \(v \ll P\). Then there exists an integrable random variable \(X\in \mathcal{B}\), such that
\[v(E) = \int_E XdP, \quad \forall E \in \mathcal{B} \] \(X\) is a.s. unique (\(P\)) and is written
\[X=\frac{dv}{dP}.\] We also write \(dv=XdP\)

Definition of Conditional Expectation

Suppose \(X\in L_1(\Omega,\mathcal{B},P)\) and let \(\mathcal{G}\subset \mathcal{B}\) be a sub-\(\sigma\)-field. Then there exists a random variable \(E(X|\mathcal{G})\), called the conditional expectation of \(X\) with respect to \(\mathcal{G}\), such that

\(E(X|\mathcal{G})\) is \(\mathcal{G}\)-measurable and integrable.
For all \(G\in\mathcal{G}\) we have \[ \int_G XdP = \int_G E(X|\mathcal{G})dP\]

Notes.

Definition of conditional probability: Given \((\Omega,\mathcal{B},P)\), a probability space, with \(\mathcal{G}\) a sub-\(\sigma\)-field of \(\mathcal{B}\), define \[P(A|\mathcal{G})=E(1_A|\mathcal{G}), \quad A\in \mathcal{B}.\] Thus \(P(A|\mathcal{G}) \) is a random variable such that

\(P(A|\mathcal{G}) \) is \(\mathcal{G}\)-measurable and integrable.
\(P(A|\mathcal{G}) \) satisfies \[\int_G P(A|\mathcal{G})dP = P(A\cap G), \quad \forall G \in \mathcal{G}. \]

Conditioning on random variables: Suppose \(\{X_t, t\in T \}\) is a family of random variables defined on \((\Omega,\mathcal{B})\) and indexed by some index set \(T\). Define \[\mathcal{G}:=\sigma(X_t,t\in T)\] to be the \sigma-field generated by the process \(\{X_t, t\in T \}\). Then define \[E(X|X_t, t\in T)= E(X|\mathcal{G}).\]

Note (1) continues the duality of probability and expectation but seems to place expectation in a somewhat more basic position, since conditional probability is defined in terms of conditional expectation.

Note (2) saves us from having to make separate definitions for \(E(X|X_1)\), \(E(X|X_1,X_2)\), etc.

Countable partitions Let \(\{\Lambda_n, n\ge 1 \}\) be a partition of \(\Omega\) so thyat \(\Lambda_i \cap \Lambda_j = \emptyset, i\neq j\), and \(\sum_n \Lambda_n=\Omega\). Define
\[\mathcal{G}=\sigma(\Lambda_n, n\ge 1)\] so that
\[\mathcal{G}=\left\{ \sum_{i\in J}\Lambda_i: J\subset\{1,2,\dots \} \right\}.\] For \(X\in L_1(P)\), define
\[E_{\Lambda_n}(X)=\int XP(d\omega|\Lambda_n)=\int_{\Lambda_n}XdP/P\Lambda_n , \] if \(P(\Lambda_n)>0\) and \(E_{\Lambda_n}(X) = 18\) if \(P(\Lambda_n)=0\). We claim

\[E(X|\mathcal{G})\overset{a.s.}{=} \sum_{n=1}^\infty E_{\Lambda_n}(X) 1_{\Lambda_n} \] and for any \(A\in \mathcal{B}\)
\[P(A|\mathcal{G})\overset{a.s.}{=} \sum_{n=1}^\infty P(A|\Lambda_n)1_{\Lambda_n}\]

Product Spaces, Transition Kernel and Rubini's Theorem

Lemma 1. Sectioning sets Sections of measurable sets are measurable. If \(A\in \mathcal{B}_1 \times \mathcal{B}_2\), then for all \(\omega_1 \in \Omega_1\),
\[A_{w_1}\in \mathcal{B_2}\]
Corollary 1. Sections of measurable functions are measurable. That is if
\[ X: (\Omega_1\times \Omega_2, \mathcal{B}_1 \times \mathcal{B}_2) \mapsto (S,\mathcal{S})\] then \[ X_{\omega_1} \in \mathcal{B}_2. \] We say \(X_{\omega_1}\) is \(\mathcal{B}/\mathcal{S}\) measurable.

Define the transition (probability) kernel

\[K(\omega_1,A_2):\Omega_1 \times \mathcal{B}_2 \mapsto [0,1]\]

if it satisfies the following

for each \(\omega_1, K(\omega_1,\cdot)\) is a probability measure on \(\mathcal{B}_2\), and
for each \(A_2\in \mathcal{B}_2, K(\cdot, A_2)\) is \(\mathcal{B}_1/\mathcal{B}([0,1])\) measurable.

Transition kernels are used to define discrete time Markov processes where \(K(\omega_1,A_2)\) represents the conditional probability that, starting from \(\omega_1\), the next movement of the system results in a state in \(A_2\).

Theorem 1. Let \(P_1\) be a probability measure on \(\mathcal{B}_1\), and suppose

\[K:\Omega_1 \times \mathcal{B}_2 \mapsto [0,1]\] is a transition kernel. Then \(K\) and \(P_1\) uniquely determine a probability on \(\mathcal{B}_1 \times \mathcal{B}_2\) via the formula

\[P(A_1\times A_2)= \int_{A_1}K(\omega_1,A_2)P_1(dw_1),\] for all \(A_1\times A_2\) in the class of measurable rectangles.

Theorem 2. Marginalization Let \(P_1\) be a probability measure on \((\Omega_1,\mathcal{B}_1)\) and suppose \(K: \Omega_1\times \mathcal{B_2} \mapsto [0,1]\) is a transition kernel. Define \(P\) on \((\Omega_1 \times \Omega_2, \mathcal{B_1}\times \mathcal{B_2})\) by
\[P(A_1\times A_2)= \int_{A_1} K(\omega_1,A_2)P_1(d\omega_1). \] Assume
\[X:(\Omega_1\times \Omega_2, \mathcal{B}_1\times \mathcal{B}_2) \mapsto (\mathbb{R},\mathcal{B}(\mathbb{R})) \] and furthermore suppose \(X\) is integrable. Then
\[Y(\omega_1)=\int_{\Omega_2} K(\omega_1,d\omega_2)X_{\omega_2}(\omega_2)\] has the properties

\(Y\) is well defined.
\(Y \in B_1\)
\(Y \in L_1(P_1)\) and furthermore

\[\int_{\Omega_1\times \Omega_2}XdP = \int_{\Omega_1} Y(\omega_1)P_1(d\omega_1) = \int_{\Omega_1} [ \int_{\Omega_2} K(\omega_1,d\omega_2) X_{\omega_1}(d\omega_2)]P_1(\omega_1). \]

Theorem 3. Fubini Theorem Let \(P=P_1\times P_2\) be a product measure. If \(X\) is \(\mathcal{B}_1 \times \mathcal{B}_2\) measurable and is either non-negative or integrable with respect to \(P\) then

\[\begin{aligned}
\int_{\Omega_1\times \Omega_2}XdP &= \int_{\Omega_1}[ \int_{\Omega_2}X_{\omega_1}(\omega_2) P_2(d\omega_2) ] P_1(d\omega_1) \\
&= \int_{\Omega_2}[ \int_{\Omega_1}X_{\omega_2}(\omega_1) P_1(d\omega_1) ] P_2(d\omega_2)
\end{aligned} \]

Thursday, January 29, 2015

Clarification of Expectation

Let \(X\) be a random variable on the probability space \((\Omega, \mathcal{B}, P)\). Recall that the distribution of X is the measure
\[F := P \circ X^{-1}\] on \((\mathbb{R},\mathcal{B}(\mathbb{R}))\) defined by
\[F(A)=P\circ X^{-1}(A) = P[X\in A].\]
The distribution function of \(X\) is
\[F(x):= F((-\infty,x])=P[X\leq x].\] Note that the letter "F" is overloaded in two ways.

An application of the Transformation Theorem allows us to compute the abstract integral
\[E(X) = \int_\Omega XdP\] as
\[E(X) = \int_\mathbb{R} xF(dx),\] which is an integral on \(\mathbb{R}\).

More precisely,
\[E(X) = \int_\Omega X(\omega)P(d\omega)=\int_\mathbb{R} x F(dx).\]
Also given a measurable function \(g(X)\), The expectation of \(g(X)\) is
\[E(g(X)) = \int_\Omega g(X(\omega))P(d\omega)=\int_\mathbb{R} g(x) F(dx).\]
Instead of computing expectations on the abstract space \(\Omega\), one can always compute them on \(\mathbb{R}\) using \(F\), the distribution of \(X\).

Random variables and Inverse maps

A random variable is a real valued function with domain \(\Omega\) which has an extra property called measurability that allows us to make probability statements about the random variable.

Suppose \(\Omega\) and \(\Omega'\) are two sets. Often \(\Omega' = \mathbb{R}\). Suppose
\[X:\Omega \mapsto \Omega',\] Then \(X\) determines an inverse map (a set valued function)
\[X^{-1}: \mathcal{P}(\Omega')\mapsto \mathcal{P}(\Omega)\] defined by
\[X^{-1}(A') = \{\omega \in \Omega : X(\omega) \in A'\}\] for \(A' \subset \Omega'\).

\(X^{-1}\) preserves complementation, union and intersections.

Wednesday, January 28, 2015

Convergence Concepts

Implications of convergence

1. Almost Sure Convergence

Examples of statements that hold almost surely (a.s.)

Let \(X,X'\) be two random variables. Then \(X=X'\) a.s. means \[P[X=X']=1;\] that is, there exists an event \(N\in \mathcal{B}\), such that \(P(N)=0\) and if \(\omega\in N^c\), then \(X(\omega)=X'(\omega)\).
If \(\{X_n\}\) is a sequence of random variables, then \(\lim_{n\rightarrow \infty}X_n\) exists a.s. means there exists an event \(N\in \mathcal{B}\), such that \(P(N)=0\) and if \(\omega\in N^c\) then \[\lim_{n\rightarrow \infty}X_n(w)\] exists. It also means that for a.a. \(\omega\), \[\underset{n\rightarrow \infty}{\text{lim sup}}X_n(\omega)=\underset{n\rightarrow \infty}{\text{lim inf}}X_n(\omega).\] We will write \(\lim_{n\rightarrow \infty}X_n = X\) or \(X_n \overset{a.s.}{\rightarrow}X\).
If \(\{X_n\}\) is a sequence of random variables, then \(\sum_n X_n\) converges a.s. means there exists an event \(N\in \mathcal{B}\), such that \(P(N)=0\), and \(\omega \in N^c\) implies \(\sum_n X_n(w)\) converges.

2. Convergence in Probability

Suppose \(X_n, n\ge 1\) and \(X\) are random variables. Then \({X_n}\) converges in probability (i.p.) to \(X\), written \(X_n \overset{P}{\rightarrow}X\), if for any \(\epsilon > 0\) \[ \lim_{n\rightarrow \infty} P[|X_n-X|>\epsilon]=0.\]
Almost sure convergence of \(\{X_n\}\) demands that for a.e. \(\omega\), \(X_n(w)-X(w)\) gets small and stay small. Convergence i.p. is weaker and merely requires that the probability of the difference \(X_n(w)-X(w)\) being non-trivial become small.

It is possible for a sequence to converge in probability but not almost surely.

Theorem 1. Convergence a.s. implies convergence i.p. Suppose that \(X_n, n\ge 1\) and \(X\) are random variables on a probability space \((\Omega,\mathcal{B},P)\). If \[ X_n \rightarrow X, \; a.s.\] then \[X_n \overset{P}{\rightarrow}X.\]
Proof. If \(X_n \rightarrow X\) a.s. then for any \(\epsilon\),
\[\begin{aligned}
0\;&=P([|X_n-X|>\epsilon]i.o.) \\
&=P(\underset{n\rightarrow \infty}{\text{lim sup}}[|X_n-X|>\epsilon]) \\
&=\lim_{N\rightarrow \infty}P(\bigcup_{n\ge N}[|X_n-X|>\epsilon] ) \\
&\ge \lim_{n\rightarrow \infty}P[|X_n-X|>\epsilon]
\end{aligned} \]

3. \(L_p\) Convergence

Recall the notation \(X\in L_p\) which means \(E(|X|^p)<\infty \). For random variables \(X,Y\in L_p\), we define the \(L_p\) metric for \(p\ge 1\) by
\[d(X,Y)=(E|X-Y|^p)^{1/p}.\] This metric is norm induced because
\[\|X\|_p := (E|X|^p)^{1/p} \] is a norm on the space \(L_p\).

A sequence \(\{X_n\}\) of random variables converges in \(L_p\) to \(X\), written
\[X_n \overset{L_p}{\rightarrow}X , \] if
\[ E(|X_n-X|^p) \rightarrow 0 \] as \(n\rightarrow \infty\).

Facts about \(L_p\) convergence.

\(L_p\) convergence implies convergence in probability: For \(p>0\), if \(X_n\overset{L_p}{\rightarrow} X\) then \(X_n \overset{P}{\rightarrow}X \). This follows readily from Chebychev's inequality, \[P[|X_n-X|\ge \epsilon] \leq \frac{E(|X_n-X|^p|)}{\epsilon^p} \rightarrow 0.\]
Convergence in probability does not imply \(L_p\) convergence. What can go wrong is that the \(n\)th function in the sequence can be huge on a very small set.
Example. Let the probability space be \( ([0,1],\mathcal{B}([0,1]),\lambda) \), where \(\lambda\) is Lebesgue measure and define
\[X_n = 2^n 1_{(0,\frac{1}{n}) }\] then
\[P[|X_n| > \epsilon ] = P \left( (0,\frac{1}{n}) \right) = \frac{1}{n} \rightarrow 0 \] but
\[ E(|X_n|^p) = 2^{np} \frac{1}{n} \rightarrow \infty \]
\(L_p\) convergence does not imply almost sure convergence.
Example. Consider the functions \(\{X_n\}\) defined on \( ([0,1],\mathcal{B}([0,1]),\lambda) \), where \(\lambda\) is Lebesgue measure.
\begin{align*}
X_1 &= 1_{[0,\frac{1}{2}]}, \quad  X_2 = 1_{[\frac{1}{2},1]} \\
X_3 &= 1_{[0,\frac{1}{3}]}, \quad  X_4 = 1_{[\frac{1}{3},\frac{2}{3}]} \\
X_5 &= 1_{[\frac{1}{3},1]}, \quad  X_6 = 1_{[0,\frac{1}{4}]}, \cdots \\
\end{align*} and so on, Note that for any \(p>0\),
\[ E(|X_1|^p)=E(|X_2|^p)=\frac{1}{2},\\
  E(|X_3|^p)=E(|X_4|^p)=E(|X_5|^p)=\frac{1}{3}, \\
E(|X_6|^p)=\frac{1}{4}, \cdots \] so \(E(|X_n|^p) \rightarrow 0\) and
\[X_n \xrightarrow[]{L_p} 0.\]
Observe that \(\{X_n\}\) does not converge almost surely to 0.

Limits and Integrals

Under certain circumstances we are allowed to interchange expectation with limits.

Theorem 1. Monotone Convergence Theorem (MCT). If
\[0\leq X_n \uparrow X\]then
\[0\leq E(X_n) \uparrow E(X)\]
Corollary 1. Series Version of MCT. If \(X_n \ge 0\) are non-negative random variables for \(n\ge1\), then
\[E(\sum_{n=1}^\infty X_n)= \sum_{n=1}^\infty E(X_n)\]
so that the expectation and infinite sum can be interchanged

Theorem 2. Fatou Lemma. If \(X_n \ge 0\), then
\[ E(\underset{n\rightarrow \infty}{\text{lim inf}} X_n ) \leq \underset{n\rightarrow \infty}{\text{lim inf}}E(X_n)\]
More generally, if there exists \(Z\in L_1\) and \(X_n\ge Z\), then
\[ E(\underset{n\rightarrow \infty}{\text{lim inf}} X_n ) \leq \underset{n\rightarrow \infty}{\text{lim inf}}E(X_n)\]
Corollary 2. More Fatou. If \( 0 \leq X_n \leq Z\) where \(Z\in L_1\), then
\[ E(\underset{n\rightarrow \infty}{\text{lim sup}} X_n ) \ge \underset{n\rightarrow \infty}{\text{lim sup}}E(X_n)\]
Theorem 3. Dominated Convergence Theorem (DCT). If
\[X_n \rightarrow X\] and there exists a dominating random variable \(Z\in L_1\) such that
\[ |X_n| \leq Z\]then
\[E(X_n)\rightarrow E(X) \; \text{and} \; E|X_n-X|\rightarrow 0.\]
\(
\newcommand{\scriptB}{\mathcal{B}}
\newcommand{\scriptP}{\mathcal{P}}
\newcommand{\vecX}{\mathbf{X}}
\newcommand{\vecx}{\mathbf{x}}
\newcommand{\reals}{\mathbb{R}}
\newcommand{\cplxs}{\mathbb{C}}
\newcommand{\rationals}{\mathbb{Q}}
\newcommand{\naturals}{\mathbb{N}}
\newcommand{\integers}{\mathbb{Z}}
\newcommand{\ntoinf}{n\rightarrow\infty}
\newcommand{\mtoinf}{m\rightarrow\infty}
\newcommand{\tendsto}{\rightarrow}
\)
Example of when interchanging limits and integrals without the dominating condition. (When something very nasty happens on a small set and the degree of nastiness overpowers the degree of smallness).

Let
\[ (\Omega, \scriptB, P) = ([0,1], \scriptB([0,1]), \lambda)\] \(\lambda\) the Lebesgue measure. Define
\[ X_n = n^2 1_{(0,1/n)}. \] For any \(\omega \in [0,1]\),
\[ 1_{(0,1/n)}(w) \tendsto 0,\] so
\[ X_n \tendsto 0. \] However
\[ E(X_n) = n^2 \cdot \frac{1}{n} = n \tendsto \infty, \] so
\[ E(\liminf_{\ntoinf} X_n) = 0 \le \liminf_{\ntoinf} (EX_n) = \infty \] and
\[ E(\limsup_{\ntoinf} X_n) = 0 \not\ge \limsup_{\ntoinf} (EX_n) = \infty.\]

Tuesday, January 27, 2015

Zero-One Laws

There are several common zero-one laws which identify the possible range of a random variable to be trivial. There are also several zero-one laws which provide the basis for all proofs of almost sure convergence.

Proposition 1. Borel-Cantelli Lemma Let \(\{A_n\}\) be any events (not necessarily independent).

If \(\sum_n{P(A_n)}<\infty\), then
\[P([A_n \; i.o.])=P(\underset{n\rightarrow \infty}{\text{lim sup}} A_n) = 0\].
Proposition 2. Borel Zero-One Law If \(\{A_n\}\) is a sequence of independent events, then

\[ \begin{equation*}
P([A_n \; i.o.])= \begin{cases}
0, \quad & \text{iff} \sum_n P(A_n) < \infty \\
1, \quad & \text{iff} \sum_n P(A_n) = \infty
\end{cases}
\end{equation*}
\]
Definition. An almost trivial \(\sigma\)-field is a \(\sigma\)-field all of whose events has probability 0 or 1.

Theorem 3. Kolmogorov Zero-One Law If \(\{X_n\}\) are independent random variables with tail \(\sigma\)-field \(\mathcal{T}\), then \(\Lambda\in \mathcal{T}\) implies \(P(\Lambda)=0\) or 1 so that the tail \(\sigma\)-field is almost trivial.

Lemma 4. Almost trivial \(\sigma\)-fields Let \(\mathcal{G}\) be an almost trivial \(\sigma\)-field and let \(X\) be a random variable measurable with respect to \(\mathcal{G}\). Then there exists \(c\) such that \(P[X=c] = 1\).

Corollary 5. Let \(\{X_n\}\) be independent random variables. Then the following are true.

(a) The event
\[ [\sum_n X_n \;converges] \] has probability 0 or 1.

(b) The random variables \(\text{lim sup}_{n\rightarrow \infty}X_n\) and \(\text{lim inf}_{n\rightarrow \infty}X_n\) are constant with probability 1.

(c) The event
\[ \{\omega: S_n(\omega)/n \rightarrow 0 \} \] has probability 0 or 1.

Monday, January 26, 2015

Inequalities

Another valuable book for anyone in computer science who ever wants to bound any quantity (so, everyone!) is: The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities by Michael Steele.

An encyclopedic book on the topic is A Dictionary of Inequalities. While this is not a book for reading cover-to-cover, it is good to have it at your disposal. See also the supplement of the book.

Moreover, Wikipedia has an excellent list of inequalities.

For specific topics, you may consult:

Friday, January 23, 2015

Dynkin's theorem

First we define a structure named \(\lambda\)-system.

A class of subsets \(\mathcal{L}\) of \(\Omega\) is called a \(\lambda\)-system if it satisfies the following postulates

1. \(\Omega\in \mathcal{L}\)
2. \(A\in\mathcal{L} \Rightarrow A^c \in \mathcal{L}\)
3. \(n\neq m, A_nA_m = \emptyset, A_n \in \mathcal{L} \Rightarrow \cup_n A_n \in \mathcal{L}\)

It is clear that a \(\sigma\)-field is always a \(\lambda\)-system.

Next a \(\pi\)-system is a class of sets closed under finite intersections.

Dynkin's theorem

a) if \(\mathcal{P}\) is a \(\pi\)-system and \(\mathcal{L}\) is a \(\lambda\)-system such that \(\mathcal{P}\subset\mathcal{L}\), then \(\sigma(\mathcal{P})\subset \mathcal{L}\).

b) If \(\mathcal{P}\) is a \(\pi\)-system,

\(\sigma(\mathcal{P})=\mathcal{L}(\mathcal{P})\)

that is, the minimal \(\sigma\)-field over \(\mathcal{P}\) equals the minimal \(\lambda\)-system over \(\mathcal{P}\)

Wednesday, January 21, 2015

An example of Stein's paradox

Charles Stein showed in 1958, that a nonlinear, biased estimator of a multivariate mean has a lower MSE compared to the ML estimator.

Given a sample of \(N\) measurements of \(X\sim\mathcal{N}(\mu,\sigma I_p)\) with unknown parameter vector \(\mu\) of length \(p\).

The James-Stein estimator is given by

\begin{equation*}
\hat{\mu}_{JS}=\left (1-\frac{(p-2)\frac{\sigma^2}{N}}{\|\bar{x} \|^2}\right ) \bar{x}
\end{equation*}
where \(\bar{x}\) is the sample mean.

This estimator dominates the MLE everywhere in terms of MSE. For all \(\mu\in\mathbb{R}^p\),

\begin{equation*}
\mathbb{E}_\mu \| \hat{\mu}_{JS}-\mu\|^2 < \mathbb{E}_\mu \| \hat{\mu}_{MLE}-\mu\|^2
\end{equation*}
This makes the MLE inadmissible for \(p\ge3\)!

Wednesday, January 14, 2015

The need for measure theory

The problem with measure arises when one needs to decompose a body into (possibly uncountable) number of components and reassemble it after some action on those components. Even when you restrict attention to just finite partitions, one still runs into trouble here. The most striking example is the Banach-Tarski paradox, which shows that a unit ball \(B\) in three dimension can be disassembled into a finite number of pieces and reassembled to form two disjoint copies of the ball \(B\).

Such pathological sets almost never come up in practical applications of mathematics. Because of this, the standard solution to the problem of measure has been to abandon the goal of measuring every subset \(E\) of \(\mathbb{R}^d\) and instead to settle for only measuring a certain subclass of
non-pathological subsets of \(\mathbb{R}^d\), referred to as the measurable sets.

The most fundamental concepts of measure is the properties of

finite or countable additivity
translation invariance
rotation invariance

The concept of Jordan measure (closely related to that of Riemann and Darboux integral) is sufficient for undergraduate level analysis.

However, the type of sets that arise in analysis, and in particular those sets that arise as limit of other sets, requires an extended concept of measurability (Lebesgue measurability)

Lebesgue theory is viewed as a completion of the Jordan-Darboux-Riemann theory. It keeps almost all of the desirable properties of Jordan measure, but with the crucial additional property that many features of the Lebesgue theory are preserved under limits.

Probability space concepts

A field is a non-empty class of subsets of \(\Omega\) closed under finite union, finite intersection and complements. A synonym for field is algebra.

From de Morgan's laws a field is also closed under finite intersection.

A \(\sigma\)-field \(\mathcal{B}\) is a non-empty class of subsets of \(\Omega\) closed under countable union, countable intersection and complements. A synonym for \(\sigma\)-field is \(\sigma\)-algebra.

In probability theory, the event space is a \(\sigma\)-field. This allows us enough flexibility constructing new-events from old ones (closure) but not so much flexibility that we have trouble assigning probabilities to the elements of the \(\sigma\)-field.

For the Reals, we start with sets that we know how to assign probabilities.

Supposes \(\Omega=\mathbb{R}\) and let

\(\mathcal{C}=\{(a,b],-\infty \leq a \leq b < \infty \}\)

The Borel sets is defined as

\(\mathcal{B}(\mathbb{R}) \equiv \sigma(\mathcal{C})\)

Also one can show that

\(\mathcal{B}(\mathbb{R}) = \sigma(\text{open sets in } \mathbb{R}) \)

Wednesday, January 07, 2015

Russell's Paradox

Shortly after the turn of the 19th century, Bertrand Russell demonstrated a hole in mathematical logic of set theory at the time. A set can be member of itself. For sets \(R\) and \(S\)

\(R = \{S | R \notin S\}\)

The set \(R\) contains all sets that do not have themselves as members.

However, is \(R\) a member of itself?

Clearly not, since by definition \(R\) is the set of all sets that do not have themselves as member.

But then, if \(R\) does not have itself as a member then it must be a member of the set \(R\)

At that point in time, it created a huge stir among the mathematical community since most of what they do are based upon the foundation of sets.

MathJax