跳转至
\[ \newcommand{\bs}{\boldsymbol} \newcommand{\bsX}{\boldsymbol{X}} \newcommand{\bf}{\mathbf} \newcommand{\msc}{\mathscr} \newcommand{\mca}{\mathcal} \newcommand{\T}{\text{T}} \newcommand{\rme}{\mathrm{e}} \newcommand{\rmi}{\mathrm{i}} \newcommand{\rmj}{\mathrm{j}} \newcommand{\rmd}{\mathrm{d}} \newcommand{\rmm}{\mathrm{m}} \newcommand{\rmb}{\mathrm{b}} \newcommand{\and}{\land} \newcommand{\or}{\lor} \newcommand{\exist}{\exists} \newcommand{\sube}{\subseteq} \newcommand{\lr}[3]{\left#1 #2 \right#3} \newcommand{\intfy}{\int_{-\infty}^{+\infty}} \newcommand{\sumfy}[1]{\sum_{#1=-\infty}^{+\infty}} \newcommand{\vt}{\vartheta} \newcommand{\ve}{\varepsilon} \newcommand{\vp}{\varphi} \newcommand{\Var}{\text{Var}} \newcommand{\Cov}{\text{Cov}} \newcommand{\edef}{\xlongequal{def}} \newcommand{\prob}{\text{P}} \newcommand{\Exp}{\text{E}} \newcommand{\t}[1]{\text#1} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\versionofnewcommand}{\text{260125}} \]

\(\mathrm{II.\ 1.}\) Basic Measure Theory

In the learning of measure theory based probability, measurability will be much more important to us than measure itself, different from that of courses aiming straight for the Dominated-Convergence Theorem, for example, real analysis.

1. Measurable spaces; \(\sigma\)-algebras; \(\pi\)-systems; \(d\)-systems

Algebra; \(\sigma\)-algebra; Measurable Space; \(\sigma(\msc{C})\)

Definition:

Let \(S\) be a set. A collection \(\Sigma_0\) of subsets of \(S\) is called an algebra on \(S\), if the following three conditions hold:

\[ \begin{aligned} S&\in\Sigma_0\\ F \in \Sigma_0 &\Rightarrow F^{c} \xlongequal{def} S \setminus F \in \Sigma_0\\ F,G \in \Sigma_0 &\Rightarrow F \cup G \in \Sigma_0 \end{aligned} \]

That is, contains the whole set, is closed under complementation, and is closed under finite unions.

Corollary:

Algebra contains the empty set, is closed under finite intersections, and is closed under set difference.

\[ \begin{aligned} &\varnothing=S^c\in\Sigma_0\\ &F\cap G=(F^c\cup G^c)^c\in\Sigma_0\\ &F\setminus G=F\cap G^c\in\Sigma_0 \end{aligned} \]

In conclusion, an algebra on \(S\) is a family of subsets of \(S\) under finitely many set operations.

**Definition: **

collection \(\Sigma\) of subsets of \(S\) is called a \(\sigma\)-algebra on \(S\), if \(\Sigma\) is an algebra on \(S\) such that whenever \(F_n\in\Sigma\ (n\in\N)\),

\[ \bigcup_n F_n\in\Sigma \]

Corollary:

\[ \bigcap_n F_n=(\ {\bigcup_n {F_n}^c}\ )^c\in \Sigma \]

Thus a \(\sigma\)-algebra on \(S\) is a family of subsets of \(S\) stable under any countable collection of set operators. All \(\sigma\)-algebras are algebra, but not all algebras are \(\sigma\)-algebra.

Proposition:

Let \(S\) be a set. The power set \(\msc{P}(S)\) of \(S\) is always a \(\sigma\)-algebra on \(S\). And \(\msc{P}(S)\) is the largest \(\sigma\)-algebra on \(S\).

Definition:

A pair \((S,\Sigma)\), where \(S\) is a set and \(\Sigma\) is a \(\sigma\)-algebra on \(S\), is called a measurable space. An element of \(\Sigma\) is called a \(\Sigma\)-measurable subset of \(S\).

Let \(\msc{C}\) be a non-empty class of non-empty subsets of \(S\). Then \(\sigma(\msc{C})\), the \(\sigma\)-algebra generated by \(\msc{C}\) is the smallest \(\sigma\)-algebra \(\Sigma\) on \(S\) such that \(\msc{C}\in\Sigma\). It is the intersection of all \(\sigma\)-algebras on \(S\) as a subclass.

\[ \sigma(\msc{C})=\bigcap\lr\{{A\ |\ A\ \text{is a }\sigma\text{-algebra on S and }\msc{C}\in A}\} \]

Technically, we can build \(\sigma({\msc{C}})\) by the following rules:

1. Take the complement of every set in \(\msc{C}\);

2. Take the countable union of every combination of sets we have so far;

3. And so on, until we can't create any new sets.

Proposition:

Let \(\msc{C}\) be a collection of subsets of a set \(S\). If \(\Sigma\) is any \(\sigma\)-algebra on \(S\) such that \(\msc{C}\sube\Sigma\), then \(\sigma(\msc{C})\sube\Sigma\).

Proof: \(\sigma({\msc{C}})\) can be built by the operations listed above, which are all closed under \(\sigma\)-algebra.

Borel \(\sigma\)-algebras

Definition:

Let \(S\) be a topological space. Then \(\msc{B}(S)\), the Borel \(\sigma\)-algebra on \(S\), is the \(\sigma\)-algebra generated by the family of open subsets of \(S\).

Commonly we use the notation \(\msc{B}\) to represent \(\msc{B}(\R)\). It is the most important of all \(\sigma\)-algebras. It serves as the bridge between topology and measure theory. Open sets are the basic elements in topology analysis, but the collection of open sets of \(\R\) is not a \(\sigma\)-algebra. Because the complement of an open set is closed set, but a closed set are mostly not a open set (unless in disconnected spaces). Thus in order to discuss the measure on \(\R\), the introduction of Borel \(\sigma\)-algebra is of significance.

\(\msc{B}(\R)\) contains most of the commonly used subsets of \(\R\), including:

1. All open sets; 2. All closed sets; 3. All single points; 4. All countable sets; 5. All countable unions and intersections of closed sets.

Example:

Let take regard at the collection

\[ \pi(\R)\edef \{(-\infty,x]:x\in\R\} \]

Proposition:

\[ \msc{B}=\sigma(\pi(\R)) \]

Proof: Firstly, we want to prove that \(\sigma(\pi(\R))\sube\msc{B}\). We just need to prove \(\pi(\R)\sube\msc{B}\). It is true because for any \(A=(-\infty,x]\in \pi(\R)\), its complement \(A^c=(x,+\infty)\in\msc{B}\), for \(\msc{B}\) is the \(\sigma\)-algebra generated by the family of open subsets of \(S\). Thus \(A\in\msc{B}\Rightarrow \pi(\R)\sube\msc{B}\).

​ Secondly, we need to prove \(\msc{B}\sube\sigma(\pi(\R))\). An idea is that \(\msc{B}=\sigma(\text{all open rays})\), because all open sets can be generated by open rays under operations closed in \(\sigma\)-algebra. It is easy to prove that all open rays are elements in \(\sigma(\pi(\R))\): \((-\infty,x)=\displaystyle\bigcup_n^{\infty}(-\infty,x-\frac{1}{n}]\).

\(\pi\)-systems and \(d\)-systems

Definition:

Let \(S\) be a set. A collection \(\msc{I}\) of subsets of \(S\) is called a \(\pi\)-system if \(\msc{I}\) is stable under finite intersections:

\[ \text{whenever }A,B\in \msc{I},\ \text{we have }A\cap B\in \msc{I} \]

Definition:

A collection \(\msc{D}\) of subsets \(S\) is called a \(d\)-system if:

1. \(S\in\msc{D}\),

2. if \(A,B\in\msc{D}\) and \(A\sube B\), then \(B\setminus A\in\msc{D}\),

3. if \(A_n\in\msc{D}\) and \(A_n\uparrow A\), then \(A\in\msc{D}\).

​ Recall that \(A_n\uparrow A\) means that \(A_n\sube A_{n+1}\) and \(\bigcup A_n=A\).

Proposition (Equivalent Definition of \(\sigma\)-algebra):

A collection \(\Sigma\) of subsets of \(S\) is a \(\sigma\)-algebra if and only if \(\Sigma\) is both a \(\pi\)-system and a \(d\)-system.

Proof: the "only if" part is trivial. For the "if" part, suppose that \(\Sigma\) is both a \(\pi\)-system and a \(d\)-system. The whole set \(S\) is an element of \(\Sigma\), and \(\Sigma\) is closed under complement, because \(\forall A\in\Sigma\), \(A^c=S\setminus A\in\Sigma\), according to the closure of set minus operation of \(d\)-system. And \(\Sigma\) is therefore closed under finite union: \(A\cup B=(A^c\cap B^c)^c\in\Sigma\), according to the definition of \(\pi\)-system. Thus \(\Sigma\) is an algebra on \(S\). Hence \(G_n\edef A_1\cup\cdots\cup A_n\in\Sigma\) and \(G_n\uparrow \bigcup E_k\), therefore \(\bigcup E_k\in\Sigma\), so \(\Sigma\) is a \(\sigma\)-algebra.

Definition:

If \(\msc{C}\) is a class of subsets of \(S\), we define \(d(\msc{C})\) to be the intersection of all \(d\)-systems that contain \(\msc{C}\). Therefore \(d(\msc{C})\) is a \(d\)-system, an it is the smallest \(d\)-system containing \(\msc{C}\).

Proposition:

\[ d(\msc{C})\sube \sigma(\msc{C}) \]

Proof: \(d(\msc{C})\) is a subset of any \(d\)-system containing \(\msc{C}\). Obviously \(\msc{C} \in \sigma(\msc{C})\), and \(\sigma\)-algebra is also a \(d\)-system.

Lemma (Dynkin):

If \(\msc{I}\) is a \(\pi\)-system on \(S\), then

\[ d(\msc{I})=\sigma(\msc{I}) \]

Proof: We want to prove that \(d(\msc{I})\sube\sigma(\msc{I})\) and \(d(\msc{I})\supe\sigma(\msc{I})\). That is equivalent to prove \(\sigma(\msc{I})\) is a \(d\)-system, which is obvious, and \(d(\msc{I})\) is a \(\sigma\)-system. Thus we need only prove that \(d(\msc{I})\) is a \(\pi\)-system. Here we divide the proof into two steps.

​ Step 1: Let us introduce two auxiliary families of sets:

\[ \begin{aligned} &\msc{D}_1=\{ B\in d(\msc{I})\ |\ \forall C\in {\msc{I}},\ B\cap C\in d({\msc{I}}) \}\\ &\msc{D}_2=\{ A\in d(\msc{I})\ |\ \forall B\in d({\msc{I}}),\ A\cap B\in d({\msc{I}}) \} \end{aligned} \]

​ It is easy to prove that \(\msc{I}\sube \msc{D}_1\). Now we want to specify this idea: \(\msc{D}_1\) inherits the \(d\)-system structure from \(d(\msc{I})\). Thus we need to check the three rules of the definition of \(d\)-system:

1. \(S\sube\msc{D}_1\) is obvious.

2. If \(B_1,B_2\in\msc{D}_1\) and \(B_2\sube B_1\), then, for \(C\in\msc{I}\), \((B_1\setminus B_2)\cap C=(B_1\cap C)\setminus(B_2\cap C)\).

3. If \(B_n\in \msc{D}_1\ (n\in\N)\) and \(B_n\uparrow B\), then, for \(C\in\msc{I}\), \((B_n\cap C)\uparrow (B\uparrow C)\), so that \(B\in \msc{D}_1\).

​ Thus \(\msc{D}_1\) is a \(d\)-system on \(S\) that contain \(\msc{I}\). \(\Rightarrow\) \(d(\msc{I})\sube \msc{D}_1\). And by definition of \(\msc{D}_1\), \(\msc{D}_1\sube d(\msc{I})\). Hence we got:

\[ \msc{D}_1=d(\msc{I}) \]

​ which means that \(\forall A\in d(\msc{I})\), \(A\cap \msc{I}\in d(\msc{I})\).

​ Step 2: We want to prove \(d(\msc{I})\sube\msc{D}_2\) such that \(d(\msc{I})\) is a \(\pi\)-system by definition. Therefore we need only to prove \(\msc{D}_2\) is a \(d\)-system containing \(\msc{I}\). The result of step 1 has shown that \(\msc{D}_2\) contains \(\msc{I}\): \(\forall B\in\msc{I}\), \(\forall C\in d(\msc{I})\), we have \(B\cap C\in d(\msc{I})\), which explains why we need step 1 as a prerequisite. And the proof of \(\msc{D}_2\) to be a \(d\)-system is the same as cases in \(\msc{D}_1\). Thus we have:

\[ d(\msc{I})\sube\msc{D}_2,\quad \text{and actually,}\ d(\msc{I})=\msc{D}_2 \]

Corollary:

Any \(d\)-system that contains a \(\pi\)-system contains the \(\sigma\)-algebra generated by that \(\pi\)-system.

2. Measurable Functions

Measurable Functions, \(\mathrm{m}(\Sigma_1/\Sigma_2)\)

Definition:

Suppose that \((S_1,\Sigma_1)\) and \((S_2,\Sigma_2)\) are measurable spaces, and that \(h\) is a map:

\[ h: S_1\to S_2 \]

Then, \(h\) is called \(\Sigma_1/\Sigma_2\)-measurable (or simply just measurable), if, for any measurable set \(A\in\Sigma_2\), the inverse image \(h^{-1}(A)\) is in \(\Sigma_1\). That is,

\[ h^{-1}:\Sigma_2\to\Sigma_1 \]

And we write: \(h\in \mathrm{m}(\Sigma_1/\Sigma_2)\). This definition is analogous to the definition of continuity.

Proposition:

The map \(h^{-1}\) preserves all set operations:

\[ \begin{aligned} h^{-1}\lr({\bigcap_\alpha A_\alpha}) &=\bigcap_\alpha{h^{-1}(A_\alpha)}\\ h^{-1}(A^c) &=(h^{-1}(A))^c\\ &\text{etc.} \end{aligned} \]

Proposition:

Suppose that \((S_1,\Sigma_1)\) and \((S_2,\Sigma_2)\) are measurable spaces, \(h:S_1\to S_2\), and \(\msc{C}\in\Sigma_2\) be a family of subsets of \(S_2\). Then, \(h\in \mathrm{m}(\Sigma_1/\Sigma_2)\), if:

\[ \sigma(\msc{C})=\Sigma_2\quad \text{and}\quad h^{-1}:\msc{C}\to\Sigma_1 \]

Proof: Let \(\msc{A}\) be the class of elements \(A\) in \(\Sigma_2\) such that \(h^{-1}(A)\in\Sigma_1\). We want to prove \(\msc{A}\supe\Sigma_2=\sigma(\msc{C})\), which suffices to prove that \(\msc{A}\) contains \(\msc{C}\) and \(\msc{A}\) is a \(\sigma\)-algebra on \(S_2\). Firstly, by the condition \(h^{-1}:\msc{C}\to\Sigma_1\), we have \(\msc{C}\sube\msc{A}\). And:

1. \(S_2\sube\msc{A}\). Because \(h^{-1}(S_2)=S_1\in \Sigma_1\).

2. \(\forall A\in\msc{A}\), \(h^{-1}(A)\in\Sigma_1\Rightarrow (h^{-1}(A))^c\in\Sigma_1\Rightarrow h^{-1}(A^c) \in \Sigma_1\Rightarrow A^c\in\msc{A}\).

3. \(\forall A,B\in\msc{A}\), \(h^{-1}(A),h^{-1}(B)\in\Sigma_1\Rightarrow h^{-1}(A)\cup h^{-1} (B) \in \Sigma_1 \Rightarrow h^{-1}(A\cup B)\in\Sigma_1\Rightarrow A\cup B\in\msc{A}\).

This proposition reveals a basic strategy in mathematics: "Generation". This is one of the most crucial proof techniques in measure theory. Instead of proving a proposition directly on the entire vast family of measurable sets, we typically use a three-step strategy:

1. Verification: Prove the proposition holds for a simple generating family of sets.

2. Extension: Prove that the family of sets satisfying the proposition itself forms a \(\sigma\)-algebra, or a monotone class, a π-system, etc.

3. Conclusion: Prove this family of sets containing the generating family, so it must contain the entire \(\sigma\)-algebra generated by that family.

**Proposition (Composition Lemma): **

If \((S_1,\Sigma_1)\), \((S_2,\Sigma_2)\) and \((S_3,\Sigma_3)\) are measurable spaces, and if \(h_1\) is measurable from \((S_1,\Sigma_1)\) to \((S_2,\Sigma_2)\) and \(h_2\) is measurable from \((S_2,\Sigma_2)\) to \((S_3,\Sigma_3)\), then \(h_1\circ h_2\) is measurable from \((S_1,\Sigma_1)\) to \((S_3,\Sigma_3)\).

\(\R\)-valued Functions; \(\rmm\Sigma\); \((\rmm\Sigma)^+\); \(\rmb\Sigma\)

Definitions:

Let \((S,\Sigma)\) be a measurable space. A function \(h: S\to \R\) is called \(\Sigma\)-measurable, and we write \(h\in\rmm\Sigma\), if \(h^{-1}:\msc{B}\to\Sigma\), that is, if \(h\in\rmm(\Sigma,\msc{B})\), where \(\msc{B}=\msc{B}(\R)\) is Borel \(\sigma\)-algebra on \(\R\). We write \((m\Sigma)^+\) for the class of non-negative elements in \(m\Sigma\), and \(\rmb\Sigma\) for the class bounded \(\Sigma\)-measurable functions on \(S\).

Extended Definition:

It is convenient if we extend these definition into the set of extended real numbers \(\overline\R=\R\cup\{-\infty,+\infty\}=[-\infty,+\infty]\), because lim sups could be infinite, and for some other reasons. Here is the extended definition: \(h\) is called \(\Sigma\)-measurable if \(h^{-1}:\msc{B}(\overline{\R})\to\Sigma\).

Proposition:

Let \((S,\Sigma)\) be a measurable space, function \(h: S\to\R\) is \(\Sigma\)-measurable if and only if:

\[ \{h\leq c\}\edef\{s\in S\ |\ h(s)\leq c \}\in\Sigma\quad (\forall c\in\R). \]

Proof: Let \(\msc{C}=\pi(\R)\sube \msc{B}\), then \(\msc{B}=\sigma({\pi(\R)})=\sigma(\msc{C})\). We want to prove \(\forall A=(-\infty,c]\in\msc{C}\), \(h^{-1}(A)\in\Sigma\), that is, \(h^{-1}: \msc{C}\to \Sigma\). The given condition can be transcribed into \(\{s\in S\ |\ h(s)\in A \}\in\Sigma\), \(\forall A\in\msc{C}\), which is exactly the set of inverse images of \(A\).

Obviously, similar results apply in which \(\{ h\leq c \}\) is replaced by \(\{ h>c \}\), \(\{ h\geq c \}\) etc.

Lemma:

Sums and products of measurable \(\R\)-valued functions are measurable: in other words, \(\rmm\Sigma\) is an algebra over \(\R\). Thus if \(\lambda\in\R\) and \(h,h_1,h_2\in\rmm\Sigma\), then:

\[ h_1+h_2\in \rmm\Sigma,\quad h_1h_2\in\rmm\Sigma,\quad \lambda h\in\rmm\Sigma. \]

Example of Proof: Let \(c\in\R\), then for \(s\in S\), it is clear that \(h_1(s)+h_2(s) > c\) if and only if for some rational \(q\) we have:

\[ h_1(s)>q>c-h_2(s) \]

​ In other words,

\[ \{ h_1(s)+h_2(s)> c \}=\bigcup_{q\in\Q}(\{ h_1>q \}\cap\{h_2>c-q\}) \]

​ a countable union of elements of \(\Sigma\).

Lemma (measurability of infs, lim lifs of funcitons):

Let \(\{h_n\ |\ n\in\N\}\) be a sequence of elements of \(\rmm\Sigma\). Then

\[ (\text{i})\ \text{inf}\ h_n,\quad \text{(ii) lim inf }h_n,\quad \text{(iii) lim sup }h_n \]

are \(\Sigma\)-measurable into \((\overline{\R},\ \msc{B}(\overline{\R}))\). But we shall still write, for example, \(\text{inf }h_n\in\rmm\Sigma\). Further,

\[ \text{(iv) }\{s\ |\ \lim\ h_n(s)\ \text{exists in }\R \}\in\Sigma \]

Proof: (i) \(\{ \inf h_n\geq c \}=\bigcap_n\{ h_n\geq c\}\)

Definition (\(\sigma\)-algebra generated by a collection of functions on \(S\)):

If we have a collection \(\{ Y_\gamma\ |\ \gamma\in C \}\) of maps \(Y_\gamma:\Omega\to\R\), then

\[ \msc{Y}\edef\sigma(Y_\gamma\ |\ \gamma\in C) \]

is defined to be the smallest \(\sigma\)-algebra \(\msc{Y}\) on \(\Omega\) such that each map \(Y_\gamma\ (\gamma\in C)\) is \(\msc{Y}\)-measurable. And clearly:

\[ \sigma(Y_\gamma\ |\ \gamma\in C)=\sigma(\{\omega\in\Omega\ |\ Y_\gamma(\omega)\in B \}\ |\ \gamma\in C,\ B\in\msc{B} ) \]

**Definition (Borel functions): **

A function \(h\) from a topological space \(S\) to \(\R\) is called Borel if \(h\) is \(\msc{B}(S)\)-measurable. The most important case in when \(S\) itself is \(\R\).

Proposition:

If \(S\) is topological and \(h:S\to\R\) is continuous, then \(h\) is Borel.

Proof: Note that the topological / measure spaces we are talking about are \((S,\msc{B}(S))\) and \((\R,\msc{B})\). According to the condition that \(h\) is continuous, \(\forall C\) is a open set of \(\R\), \(h^{-1}(C)\) is an open set of S, thus \(h^{-1}(C)\sube\msc{B}(S)\). Let \(\msc{C}\) to be the family of open sets of \(\R\), we have \(h^{-1}:\msc{C}\to\msc{B}(S)\). And by definition, \(\sigma(\msc{C})\) is \(\msc{B}\). Thus \(h\) is \(\msc{B}(S)\)-measurable, according to a proposition proved before.

3. Monotone-Class Theorems

The following elementary Monotone-Class Theorem allows us to deduce results about general measurable functions from results about indicators of elements of \(\pi\)-systems.

Definition (Italian font for supplementary definition):

Let \(S\) be a set and \(A\) be a subset of \(S\) (i.e., \(A \subseteq S\)). The indicator function of the set \(A\), denoted by \(1_A\) (or sometimes \(\mathbf{1}_A\), \(I_A\), or \(\chi_A\)), is a function from \(S\) to the set \(\{0, 1\}\), defined as follows:

\[ 1_A(x) = \begin{cases} 1 & \text{if } x \in A \\ 0 & \text{if } x \notin A \end{cases} \]

In simpler terms, for any element \(x\) in the universal set \(S\), the indicator function \(1_A(x)\) outputs \(1\) if \(x\) is an element of the specific subset \(A\), and \(0\) if \(x\) is not an element of \(A\).

Before we go through the theorems, we may define a well-behaved set: Let \(\msc{H}\) be a class of bounded functions from a set \(S\) into \(\R\) satisfying the following conditions:

​ (i) \(\msc{H}\) is a vector space over \(\R\);

​ (ii) the constant function \(1\) is an element of \(\msc{H}\);

​ (iii) if \(\{f_n\}\) is a sequence of non-negative functions in \(\msc{H}\) such that \(f_n\uparrow f\), where \(f\) is a bounded function on \(S\), then \(f\in\msc{H}\).

Theorem:

If \(\msc{H}\) contains the indicator of every set in a \(\pi\)-system \(\msc{I}\). Then, \(\msc{H}\) contains every bounded \(\sigma(\msc{I})\)-measurable function on \(S\).

\[ \begin{aligned} &\forall A\in \msc{I}\ \text{is a }\pi\text{-system},\ I_A\in\msc{H}\\ \Rightarrow\quad &\msc{H} \supe \{f:S\to\R\ | \ f \text{ is }\sigma(\msc{I})\text{-measurable, } f \text{ is bounded.} \} \end{aligned} \]

Proof: Let's put it step by step.

Step 1: Let \(\msc{D}\) be the class of subsets \(D\) of \(S\) s.t. \(I_D\in\msc{H}\).

Step 2: Prove that \(\msc{D}\) is a \(d\)-system on \(S\).

​ (i) \(I_S=1\in\msc{H}\Rightarrow S\in\msc{D}\).

​ (ii) \(\forall A,B\in \msc{D},\ A\sube B,\) We have \(I_{B\setminus A}=I_B-I_A\). Because \(\msc{H}\) is a vector space over \(\R\), it is closed under subtraction. Thus \(B\setminus A\in\msc{H}\).

​ (iii) For \(A_n\in\msc{D}\), we have \(I_{A_n}\in\msc{H}\). If \(A_n\uparrow A\), then \(I_{A_n}\uparrow I_A\), which is obviously a bounded function on \(S\), thus \(I_{A}\in \msc{H}\) according to condition (iii) \(\Rightarrow\) \(A\in\msc{D}\).

Step 3: Utilize Dynkin's Lemma: The \(\pi\)-system \(\msc{I}\sube \msc{D}\) by the condition of the theorem, and \(\msc{D}\) is a \(d\)-system on \(S\) \(\Rightarrow\) \(\msc{D}\supe d(\msc{I})=\sigma(\msc{I})\). This step shows that \(\forall A\in\sigma(\msc{I}),\ I_A\in\msc{H}\).

Step 4: Extension to non-negative bounded \(\sigma(\msc{I})\)-measurable function: Suppose that \(f\) is a \(\sigma(\msc{I})\)-measurable function such that \(\exist K\in \N\), \(\forall s\in S,\ 0\leq f(s)<K\). Consider a partition of the range of \(f(s)\) in order to approximate it:

\[ D(n,i)=\{ s\ |\ i\cdot2^{-n}\leq f(s)< (i+1)\cdot2^{-n} \} \]

​ And a series of functions:

\[ f_n(s)=\sum_{i=0}^{K\cdot2^n} i\cdot 2^{-n}\cdot I_{D(n,1)}(s) \]

​ Recall that

\[ \{s\in S\ |\ f(s)< c \}\in\sigma(\msc{I}), \ \forall c\in\R \Leftrightarrow f(s) \text{ is }\sigma(\msc{I})\text{-measurable} \Leftrightarrow \{s\in S\ |\ f(s)\geq c \}\in\sigma(\msc{I})\ \forall c\in\R \]

​ Therefore we have:

\[ \begin{aligned} &\{s\in S\ |\ f(s)< (i+1)\cdot2^{-n} \}\in\sigma(\msc{I})\\ \text{and }\ &\{s\in S\ |\ f(s)\geq i\cdot2^{-n}\} \in\sigma(\msc{I})\\ \Rightarrow&\ D(n,i)=\{s\in S\ |\ f(s)< (i+1)\cdot2^{-n} \}\ \cap\ \{s\in S\ |\ f(s)\geq i\cdot2^{-n}\}\in\sigma(\msc{I})\sube \msc{D}\\ \Rightarrow&\ I_{D(n,i)}\in\msc{H} \end{aligned} \]

\(f_n(s)\) is a linear combination of \(I_{D(n,i)}\), thus \(f_n(s)\in\msc{H}\) because \(\msc{H}\) is a vector space. Therefore \(f(s)\in\msc{H}\) according to condition (iii).

Step 5: Extension to general bounded \(\sigma(\msc{I})\)-measurable functions: \(f\) can be decomposed to its positive part \(f^+=\max\{f,0\}\) and negative part \(f^-=\min\{f,0\}\), and \(f=f^+-f^-\), where \(f^+\) and \(-f^-\) are both non-negative bounded \(\sigma(\msc{I})\)-measurable functions. Thus, \(f^{+},-f^{-}\in\msc{H}\), and its linear combination \(f=f^++(-f^-)\in\msc{H}\).

​ Then, we have proved that \(\msc{H}\) contains every bounded \(\sigma(\msc{I})\)-measurable function on \(S\), which is a cheerful conclusion! And we can say:

\[ \pi\text{-system is all you need} \]

​ That is, to restate our theorem: If an \(\msc{H}\) can recognize the events in a merely simple \(\pi\)-system, only by including all the indicators, then \(\msc{H}\) contains every bounded \(\sigma(\msc{I})\)-measurable function.

4. Measures; the Uniqueness Lemma; a.e. \((\mu,\Sigma)\)

Set Functions; Additivity; \(\sigma\)-additivity

Defnition:

Let \(S\) be a set, \(\Sigma_0\) is an algebra on \(S\). And let:

\[ \mu_0 : \Sigma_0 \to [0,\infty] \]

be a non-negative set function. Then, \(\mu_0\) is called additive, if:

\[ \begin{aligned} \text{(i)}&\quad \mu_0(\varnothing)=0\\ \text{(ii)}&\quad \forall F, G \in \Sigma_0,\ F\cap G=\varnothing \Rightarrow \mu_0(F\cup G)=\mu_0(F)+\mu_0(G) \end{aligned} \]

Usually we only talk about non-negative measure, which explains why we let \(\mu_0\) be a non-negative set function.

Let \(\{ G_n \}\) be an arbitrary sequence of disjoint sets in \(\Sigma_0\) with union \(\displaystyle\bigcup_{n=1}^\infty G_n=G\in\Sigma_0\). The map \(\mu_0\) is called countably additive, or \(\sigma\)-additive if:

\[ \begin{aligned} \text{(i)}&\quad \mu_0(\varnothing)=0\\ \text{(ii)}&\quad \mu_0(G)=\sum_{n=1}^\infty \mu_0(G_n) \end{aligned} \]

Note that the \(\Sigma_0\) need not to be a \(\sigma\)-algebra. If it is,

Lemma:

Suppose that \(\mu_0\) is additive on \((S,\Sigma_0)\), \(\{ F_n \}\) is an arbitrary sequence of sets in \(\Sigma_0\) with \(F_n\uparrow F\in\Sigma_0\). Then \(\mu_0\) is \(\sigma\)-additive on \(\Sigma_0\) if and only if \(\mu_0(F_n)\uparrow\mu_0(F)\). This is a fundamental property of measure.

Proof: Recall that \(F_n\uparrow F\) means

\[ F_n\sube F_{n+1},\ \forall n\in \N,\ \bigcup_{k=1}^{\infty} F_k=\lim_{n\to\infty}F_n=F \]

​ For the "\(\Rightarrow\)" part, write \(G_1=F_1,\ G_n=F_n\setminus F_{n-1}(n\geq 2)\). Then the sets \(\{ G_n \}\) are disjoint, thus

\[ \mu_0(F_n)=\mu_0(G_1\cup\cdots\cup G_n)=\sum_{k=1}^n \mu_0(G_k) \]

​ According to the property (ii) of \(\sigma\)-additivity,

\[ \lim_{n\to\infty}\sum_{k=1}^n \mu_0(G_k)=\lim_{n\to\infty}\mu_0(F_n)=\mu_0(F) \]

​ And the "\(\Leftarrow\)" part is now obvious.

Lemma:

If \(\mu_0(S)<\infty\), then \(\mu_0\) is additive on \(\Sigma_0\) if and only if whenever \(F_n\in\Sigma_0,\ n\in\N\) and \(F_n\downarrow\varnothing\). we have \(\mu(F_n)\downarrow 0\).

Proof: For the "\(\Rightarrow\)" part, write \(G_1=S\setminus F_1,\ G_n=F_{n-1}\setminus F_n\ (n\geq 2)\). Then, \(\{ G_n \}\) are disjoint. Thus,

\[ \begin{aligned} &\bigcup_{k=1}^n G_k=\lim_{n\to\infty}S\setminus F_n=S\setminus \varnothing=S\\ \Rightarrow\quad & \lim_{n\to\infty}\sum_{k=1}^n \mu_0(G_k)=\mu_0(S) \end{aligned} \]

​ According to the definition of additivity,

\[ \mu_0(S)=\mu_0(G_1\cup\cdots \cup G_n\cup F_n)=\sum_{k=1}^{n}\mu_0(G_k)+\mu_0(F_n) \]

​ Therefore, by taking limits on two sides, we have:

\[ \begin{aligned} & \mu_0(S)=\mu_0(S)+\lim_{n\to\infty}\mu_0(F_n)\\ \Rightarrow \quad & \lim_{n\to\infty}\mu_0(F_n)=0 \end{aligned} \]

Measure Space; Finite and \(\sigma\)-finite Measures

Defnition:

Let \((S,\Sigma)\) be a measurable space, so that \(\Sigma\) is a \(\sigma\)-algebra on \(S\). A map

\[ \mu: \Sigma\to [0,\infty] \]

is called a measure on \((S,\Sigma)\), if \(\mu\) is \(\sigma\)-additive. The triple \((S,\Sigma,\mu)\) is then called a measure space.

Definition:

Let \((S,\Sigma,\mu)\) be a measure space. Then \(\mu\), or indeed the measure space \((S,\Sigma,\mu)\) is called finite if:

\[ \mu(S)<\infty \]

And it is called \(\sigma\)-finite if there is a sequence \(\{ S_n \}\ (n\in \N)\) of elements of \(\Sigma\) s.t.

\[ \mu(S_n)<\infty,\ \forall n\in\N,\quad \text{and}\ \bigcup_n S_n=S \]

Actually, most of the measures we meet everyday are \(\sigma\)-finite, otherwise it could be rather crazy.

Definition:

An element \(F\) of \(\Sigma\) is called \(\mu\)-null if \(\mu(F)=0\). And a countable union of \(\mu\)-null is still \(\mu\)-null.

Definition:

A statement \(\msc{S}\) about points $s\in S $ is said to hold \(\mu\)-almost everywhere, a.e. \((\mu,\Sigma)\), or just a.e. \((\mu)\), if:

\[ F=\{ s\in S\ |\ \msc{S}(s)\text{ is false} \}\in\Sigma\quad \text{and}\quad \mu(F)=0 \]

the Uniqueness Lemma

Lemma (the Uniqueness Lemma):

Let \(\msc{I}\) be a \(\pi\)-system on a set \(S\), and let \(\Sigma=\sigma(\msc{I})\). Suppose that \(\mu_1\) and \(\mu_2\) are measures on \((S,\Sigma)\) s.t. \(\mu_1(S)=\mu_2(S)<\infty\) and \(\mu_1=\mu_2\) on \(\msc{I}\). Then:

\[ \mu_1=\mu_2\quad \text{on}\ \Sigma \]

That is, if two measures agree on a \(\pi\)-system, then they agree on the \(\sigma\)-algebra generated by that \(\pi\)-system.

Proof: Let \(\msc{D}\) be the class of subsets \(D\) of \(\Sigma\) s.t. \(\mu_1(D)=\mu_2(D)\). \(\msc{I}\sube\msc{D}\) by the condition. Let's check whether \(\msc{D}\) is a \(d\)-system:

​ 1. \(S\sube \msc{I}\sube \msc{D}\)

​ 2. \(\forall F,G\sube \msc{D}\), \(F\sube G\), \(\Rightarrow\) \((F\setminus G)\cap G=\varnothing\ \Rightarrow\) \(\mu_1(F\setminus G)=\mu_1(F)-\mu_1(G)\) . Thus \(\mu_1(F\setminus G)=\mu_2(F\setminus G)\Rightarrow F\setminus G\in \msc{D}\).

​ 3. If \(\{ F_n \}\) is a class of subsets of \(\msc{D}\) s,t, \(F_n\uparrow F\), according to the lemma we had proved, \(\mu_1(F_n)\uparrow \mu_1(F)\). Hence we get:

\[ \mu_1(F)=\lim_{n\to\infty}\sum_{k=1}^n\mu_1(F_k)=\lim_{n\to\infty}\sum_{k=1}^n\mu_2(F_k)=\mu_2(F)\Rightarrow F\in\msc{D} \]

​ Therefore \(\msc{D}\) is a \(d\)-system containing \(\msc{I}\). \(\Rightarrow\) \(\msc{D}\supe d(\msc{I})=\sigma(\msc{I})=\Sigma\Rightarrow \mu_1=\mu_2\ \text{on}\ \Sigma\).

5. Carathéodory's Theorem; Lebesgue Measure

The following result underpins the exitstence of every non-trivial probablistic model. We shall see its use in the celebrated Daniell-Kolmogorov Theorem on the existence of stochastic processes. The proof of related proposition is also non-trivial, however. We may use the knowledge of outer measure, which ought to be introduced in the next section.

Carathéodory's Extension Theorem

Definition:

Let \(S\) be a set. A function \(\mu^*:\msc{P}(S)\to[0,\infty]\) is called a outer measure, if:

​ (i) Non-negative: \(\mu^*(A)\geq 0,\ \forall A\sube S\);

​ (ii) Measure of empty set is zero: \(\mu^*(\varnothing)=0\);

​ (iii) Coutably sub-additivie: \(\mu^*\lr({\bigcup_{i=1}^\infty A_i})\leq\sum_{i=1}^\infty \mu^*(A_i)\), for any arbitrary countable sets \(\{A_i\}_{i=1}^\infty\sube\msc{P}(S)\).

Definition (Carathéodory's Criterion / Condition)

A set \(A\sube S\) is \(\mu^*\)-measurable, if:

\[ \mu^*(E)=\mu^*(E\cap A)+\mu^*(E\cap A^c),\quad \forall E\sube S \]

Thus can say that \(A\) has a well-behaved boundary, and can cleanly partition any subset of \(S\) without remainder.

Theorem (Carathéodory's Theorem):

Suppose \(\mu^*\) is an outer measure on \(S\). Let \(\msc{M}\) be the class of sets that is \(\mu^*\)-measurable. Then, \(\msc{M}\) is a \(\sigma\)-algebra on \(S\), and \(\mu^*\) is a measure on \((S,\msc{M})\). The proof of it can be easily accomplished by the definition of \(\mu^*\)-measurable.

**Theorem (Carathéodory's Extension Theorem): **

Let \(S\) be a set, let \(\Sigma_0\) be an algebra on \(S\) (no need to be a \(\sigma\)-algebra), and let \(\Sigma=\sigma(\Sigma_0)\). If \(\mu_0\) is a countably additive map \(\mu_0:\Sigma_0\to [0,\infty]\), then there exists a measure \(\mu\) on \((S,\Sigma)\) such that:

\[ \mu=\mu_0\quad \text{on}\quad \Sigma_0 \]

Moreover, if \(\mu_0(S)<\infty\), then, by the uniqueness lemma, the extension is unique, for the algebra \(\Sigma_0\) is definitely a \(\pi\)-system.

Proof: The classical proof of Carathéodory's theorem may depends on outer measures, which will be introduced in the following subsection. Here we give the proof in advance. The whole progress of our proof is divided into four steps. Here

Step 1: Define a \(\mu^*\) and prove it to be a outer measure. Let:

\[ \mu^*(A)=\inf\left\{ \sum_{i=1}^\infty \mu_0(A_i)\ \Big|\ A\sube\bigcup_{i=1}^\infty A_i,\ A_i\in\Sigma_0 \right\} \]

​ Obviously we have \(\mu^*\geq 0\), \(\mu^*(\varnothing)=0\), and \(\forall A, B\sube S,\ A\sube B,\ \mu^*(A)\leq\mu^*(B)\). Therefore we only need to prove its countable sub-additivity. Take an arbitrary sequence of subsets of \(S\), denoted as \(\{ F_i \}_{i=1}^\infty\), for any \(\ve\) we can find a cover of \(F_i\) in \(\Sigma_0\):

\[ \exist \{ G_{ij} \}_{j=1}^\infty,\ G_{ij}\in\Sigma_0\quad \text{s.t.}\quad F_i \sube \bigcup_{j=1}^\infty G_{ij}\\ \text{and}\quad \sum_{j=1}^\infty \mu_0(G_{ij})<\mu^*(F_i)+\frac{\ve}{2^i} \]

​ we add them up:

\[ \Rightarrow\quad \mu^*\lr({\bigcup_{i=1}^\infty F_i}) \leq \sum_{i=1}^\infty\sum_{j=1}^\infty\mu_0(G_{i,j}) <\sum_{i=1}^\infty\mu^*(F_i)+\ve \]

\(\ve\) is an arbitrary non-neagtive real number, thus \(\mu^*\) is countably sub-additive.

Step 2: Prove \(\mu^*=\mu_0\) on \(\Sigma_0\). Firstly, \(\mu^*(A)\leq\mu_0(A)\) is obvious, for we can choose \(\{ A_i\}_{i=1}^\infty=\{A,\varnothing,\varnothing,\cdots\}\). Now we prove \(\mu*\geq \mu_0\). It is also cohort with our intuition that, actually, because \(\mu^*(A)\) is derived from a cover of \(A\). We want to utilize the countable additivity property of \(\mu_0\). Thus we let:

\[ B_i=A\cap A_i\\ \Rightarrow\quad \bigcup_{i=1}^\infty B_i=A\cap \lr({\bigcup_{i=1}^\infty A_i})=A \]

​ In order to decomposite \(\mu_0(\cdot)\) into \(\sum \mu_0(\cup\cdot)\) form, we need a disjointification technique:

\[ \begin{aligned} &C_1=B_1,\quad C_i=B_i\setminus\lr({\bigcup_{k=1}^{i-1}B_k}),\ k\geq2\\ \Rightarrow\quad &\bigcup_{i=1}^\infty C_i=\bigcup_{i=1}^\infty B_i=A,\quad C_i\sube B_i\sube A_i\\\ \Rightarrow\quad &\mu_0(A)=\mu_0\lr({\bigcup_{i=1}^\infty C_i})=\sum_{i=1}^\infty\mu_0(C_i) \leq\sum_{i=1}^\infty\mu_0 (B_i)\leq\sum_{i=1}^\infty \mu_0(A_i) \end{aligned} \]

​ where \(\{A_i\}\) is an arbitrary class of subsets in \(\Sigma_0\) that covers \(A\). Thus,

\[ \mu_0(A)\leq \inf\left\{ \sum_{i=1}^\infty \mu_0(A_i)\ \Big|\ A\sube\bigcup_{i=1}^\infty A_i,\ A_i\in\Sigma_0 \right\}=\mu^*(A),\quad \forall A\in\Sigma_0 \]

Step 3: Prove \(\Sigma_0\) is \(\mu^*\)-measurable, for we have to prove that \(\mu^*\) is a measure on \(\Sigma=\sigma(\Sigma_0)\). It is a nice attempt to construct a form similar to the definition of \(\mu^*\)-measurability. For any \(\ve>0\), assume that \(\{A_i\}_{i=1}^\infty\) is an arbitrary cover of \(A\), \(A_i\in\Sigma.\ \forall i\). such that

\[ \sum_{i=1}^\infty\mu_0(A_i)<\mu^*(A)+\ve \]

​ We can decompose \(A_i\) into two disjoint part:

\[ \begin{aligned} A_i&=(A_i\cap A)\cup(A_i\cap A^c)\\ \Rightarrow \quad \mu_0(A_i)&=\mu_0(A_i\cap A)+\mu_0(A_i\cap A^c) \end{aligned} \]

​ For any \(E\sube S\), \(E\cap A\sube A=\lr({\bigcup_{i=1}^\infty A_i})\cap A=\bigcup_{i=1}^\infty (A_i\cap A)\), that is, \(\{ A_i\cap A\}_{i=1}^\infty\) is a cover in \(\Sigma_0\) of \(E\cap A\). Thus

\[ \mu^*(E\cap A)\leq \sum_{i=1}^\infty \mu_0(A_i\cap A) \]

​ And by the same idea,

\[ \mu^*(E\cap A^c)\leq \sum_{i=1}^\infty\mu_0(A_i\cap A^c)\\ \]
\[ \begin{aligned} \Rightarrow\quad \mu^*(E\cap A)+\mu^*(E\cap A^c) \leq& \sum_{i=1}^\infty \lr({\mu_0(A_i\cap A)+ \mu_0(A_i\cap A^c)})\\ =&\ \sum_{i=1}^\infty\mu_0(A_i)<\mu^*(A)+\ve \end{aligned} \]

​ Let \(\ve\to0\), hence the final equivalance:

\[ \mu^*(E\cap A)+\mu^*(E\cap A^c)=\mu^*(A) \]

​ which indicates that \(\Sigma_0\) is \(\mu^*\)-measurable.

Step 4: \(\Sigma_0\) is \(\mu^∗\)-measurable \(\Rightarrow\) \(\Sigma_0\sube \msc{M}\), where \(\msc{M}\) contains all the \(\mu^*\)-measurable sets. Meanwhile, \(\msc{M}\) is a \(\sigma\)-algebra, and \(\mu^*\) is a measure on \((S,\msc{M})\). Thus \(\Sigma=\sigma(\Sigma_0)\sube\msc{M}\), and \(\mu^*\) is a measure on \((S,\Sigma)\). Moreover, according to the Uniqueness Lemma, \(\Sigma_0\) is definitely a \(\pi\)-system, \(\mu:=\mu^*=\mu_0\) on \(\Sigma_0\), therefore \(\mu^*=\mu_0\) on \(\sigma(\Sigma_0)=\Sigma\).

​ Finally, the original outer measure \(\mu^*\) is actually a unique extension of \(\mu_0\), from \(\Sigma_0\) to \(\Sigma=\sigma(\Sigma_0)\), allowing us to extend a measure from a small and simple algebra to a more complicated \(\sigma\)-algebra. The classical application of this powerful tool is the construction of Lebesgue Measure, as shown below.

Lebesgue measure \(\mathrm{Leb}\)

Let \(S=(0,1]\). Here we construct a family of subsets of \(S\) as \(\Sigma_0\):

\[ \Sigma_0=\{F\sube S\ |\ F=\bigcup_{k=1}^r (a_k,b_k],\ 0\leq a_1 \leq b_1<a_2\leq b_2<\cdots< a_r \leq b_r\leq 1\},\ r\in\N \]

Then \(\Sigma_0\) is an algebra on \((0,1]\) and

\[ \Sigma\edef \sigma(\Sigma_0)=\msc{B}(0,1] \]

Let

\[ \mu_0(F)=\sum_{k\leq r}(b_k-a_k) \]

Then \(\mu_0\) is well-defined and additive on \(\Sigma_0\). Actually, \(\mu_0\) is \(\sigma\)-additive on \(\Sigma_0\), although the proof is not trivial. We may prove this later, after the elegant proof of Daniell-Kolmogorov Theorem in Section 2. This additive \(\mu_0\) on algebra \(\Sigma_0\) is called a premeasure. According to the Carathéodory's Extension Theorem, there exists a unique measure \(\mu\) on \(((0,1],\msc{B}(0,1])\), extending our \(\mu_0\) on \(\Sigma_0\). We call this unique measure Lebesgue Measure on \(((0,1],\msc{B}(0,1])\), or loosely, on \((0,1]\). If we define the set \(\{ 0\}\) has the Lebesgue measure \(0\), we can extend Lebesgue measure to \([0,1]\), and moreover, on \(\R\).

6. Inner and Outer \(\mu\)-measures; Completion

Definition:

Let \((S,\Sigma,\mu)\) be a measure space. For \(G\sube S\), define the inner \(\mu\)-measure \(\mu_*(G)\) of \(G\) via:

\[ \mu_*(G)\edef\sup\{\mu(F)\ |\ F\in\Sigma,\ F\sube G \} \]

and the outer \(\mu\)-measurre \(\mu^*(G)\) of \(G\) via:

\[ \mu^*(G)\edef \inf\{ \mu(H)\ |\ H\in\Sigma,\ H\supe G \} \]

If \(\mu_*(G)=\mu^*(G)\), we say that \(G\) is \(\mu\)-measurable, which is the definition of measurability of a set. And we write \(G\in \Sigma^\mu\). Then, \(\Sigma^\mu\) is a \(\sigma\)-algebra, and we can extend \(\mu\) to another measure, still denoted by \(\mu\), on \(\Sigma^\mu\) by writing

\[ \mu(G)\edef\mu_*(G)=\mu^*(G)\quad \text{for}\ G\ \text{in}\ \Sigma^\mu \]

Originally and basically, in measure space \((S,\Sigma,\mu)\), we may say a set \(A\) is measurable if \(A\in\Sigma\). Because we define \(\mu:\Sigma\to[0,\infty]\). However, there may exist some sets in measure space that cannot be derived from countable set operations of subsets of \(S\), thus \(\notin\Sigma\). So that the measure of those sets cannot be defined, therefore we need a completed definiton of measurability. For some sets, for example, Vitali Sets, they are too wierd to define its measure. However, we can turn to the sets whose \(\mu\)-measure is \(0\), but some of its subsets are not measurable, i.e. \(\in\Sigma\). (Real analysis cares more about these wierd cases.) Luckily we have inner and outer \(\mu\)-measure, through which we can define a set \(G\) is measurable by checking whether the equivalence of inner and outer measure is satisfied. Thus those subsets of zero-(outer-)measure sets that are not measurable is now assigned the measure zero. The newly-born triple \((S,\Sigma^\mu,\mu)\) is called the completion of \((S,\Sigma,\mu)\). And the \(\sigma\)-algebra \(\Sigma^\mu\) is the smallest \(\sigma\)-algebra that extends \(\Sigma\) and contains every set of outer \(\mu\)-measure \(0\).

Lemma:

Suppose that \(\mu(S)=1\) and that \(G\) is a subset of \(S\) with \(\mu^*(G)=1\). Then, for \(F\in\Sigma\), \(\mu^*(G\cap F)=\mu(F)\). Moreover, \((G,\msc{G},\mu^*)\) is a measure space, where \(\msc{G}\) is the class of subsets of \(G\) of the form \(G\cap F\), where \(F\in\Sigma\).

Proof:

\(\mu^*(G\cap F)\leq \mu^*(F)=\mu(F)\) is obvious. Now we prove \(\mu(F)=\mu^*(F)\leq\mu^*(G\cap F)\). A key question is, only \(S\) and \(F\) is known to be measurable, while the measurability of \(G\) is in vague. Meanwhile, the outer measure only has sub-additivity. So that we have to control the tightness of the bound when using the sub-additivity, or to say, try to use more equivalance with \(S\) and \(F\).

\[ \begin{aligned} \mu(F)+\mu(F^c)&=\mu(S)\\ &=\mu^*(G)\\ &\leq\mu^*(G\cap F)+\mu^*(G\cap F^c)\\ &\leq\mu^*(G\cap F)+\mu^*(F^c)\\ &=\mu^*(G\cap F)+\mu(F^c) \end{aligned} \]

​ Hence we have: \(\mu(F)\leq\mu^*(G\cap F)\) by subtracting \(\mu(F^c)\) on both sides. It is of significance that we could not use the form \(-\mu^*(\cdot)\) because of the subtle difference between \(\mu\) and \(\mu^*\). The minus form should be based on the additivity. It is quite a rigorous concept.

​ Moreover, in order to prove \((G,\msc{G},\mu)\) is a measure space, we only need to verify whether \(\msc{G}\) is a \(\sigma\)-algebra on \(G\) and \(\mu^*\) is \(\sigma\)-additive.

​ (\(\msc{G}\)-i) Let \(F=S\in\Sigma\), \(G=G\cap S\in\msc{G}\).

​ (\(\msc{G}\)-ii) \(\forall F\in\Sigma,\ F^c\in\Sigma\Rightarrow G\cap F^c\in\msc{G}\).

​ (\(\msc{G}\)-iii) \(\forall \{F_i\}_{i=1}^\infty\sube\Sigma,\ \bigcup_{i=1}^\infty F_i\in\Sigma\Rightarrow \bigcup_{i=1}^\infty (G\cap F_i)=G\cap \lr({\bigcup_{i=1}^\infty F_i})\in\msc{G}\).

​ (\(\mu^*\)-additive) For an arbitrary countable sequence \(\{A_i\}_{i=1}^\infty\in\msc{G},\ \bigcap_{i=1}^\infty A_i=\varnothing\), assume that \(A_i=G\cap F_i,\ F_i\in\Sigma,\) we have

\[ \mu^*\lr({\bigcup_{i=1}^\infty A_i})=\mu^*\lr({\bigcup_{i=1}^\infty G\cap F_i})=\mu\lr({\bigcup_{i=1}^\infty F_i}) \]

​ Note that \(F_i\) may not be disjoint. However, we can achieve our wanted result by disjointification, that is, construct a sequence \(\{ F^\prime_i\}\) such that \(F_i^\prime\) are disjoint and \(\bigcup F^\prime_i=\bigcup F_i\).

\[ \begin{aligned} &F_1^\prime=F_1\\ &F_n^\prime=F_n\setminus \lr({\bigcup_{k=1}^{n-1} F_k})=F_n\cap \lr({\bigcup_{k=1}^{n-1} F_k})^c= F_n\cap\lr({\bigcup_{k=1}^{n-1} {F_k}^c}),\quad n\geq 2\\ \end{aligned} \]

​ Since \(\forall n,\ \forall i<n,\ (G\cap F_n)\cap (G\cap F_i)=A_i\cap A_n=\varnothing\), we have \(G\cap F_n\sube G\setminus F_i\), thus

\[ G\cap F_n\sube G\setminus \bigcup_{i=1}^{n-1}F_i= G\cap\lr({\bigcup_{i=1}^{n-1}F_i})^c= G\cap \lr({\bigcup_{k=1}^{n-1} {F_k}^c}) \]

​ For convenience, we let

\[ B_n=\bigcup_{k=1}^{n-1} {F_k}^c,\quad n\geq2 \]

​ Now we have:

\[ \begin{aligned} G\cap F_n &\sube G\cap B_n\\ \Rightarrow\ G\cap F_n&=(G\cap F_n)\cap (G\cap B_n)\\ &=G\cap F_n\cap B_n\\ &=G\cap(F_n\cap B_n)\\ &=G\cap \lr({F_n\cap\lr({\bigcup_{k=1}^{n-1} {F_k}^c})})\\ &=G\cap F^\prime_n \end{aligned} \]

​ Finally,

\[ \begin{aligned} \mu^*\lr({\bigcup_{i=1}^\infty A_i})&=\mu\lr({\bigcup_{i=1}^\infty F_i}) =\mu\lr({\bigcup_{i=1}^\infty F_i^\prime})\\ &=\sum_{i=1}^\infty\mu(F_i^\prime)=\sum_{i=1}^\infty\mu^*(G\cap F^\prime_i)\\ &=\sum_{i=1}^\infty\mu^*(G\cap F_i)\\ &=\sum_{i=1}^\infty\mu^*(A_i) \end{aligned} \]

​ Therefore \(\mu^*\) is countably additive in \((G,\msc{G},\mu^*)\).

Note that the given condition \(\mu(S)=\mu^*(G)=1\) is redundant since \(\mu(S)=\mu^*(G)\) is enough. Actually, the "\(=1\)" here indicates its relationship with probability.

7. Definition of Integral

Let \((S,\Sigma,\mu)\) be a measure space. Following discussion are all based on this measure space.

Integrals of Non-Negative Functions

Definition:

If \(A\) is an element of \(\Sigma\), we define the notation \(\mu_0(I_A)\) to be the measure of set \(A\):

\[ \mu_0(I_A)\edef\mu(A)\leq\infty \]

An element \(f\) of \((\mathrm{m}\Sigma)^+\) is a simple function, and we shall then write \(f\in\mathrm{SF}^+\), if \(f\) can be written as a finite sum of a class of indicator functions:

\[ f=\sum_{k=1}^m a_kI_{A_k} \]

where \(a_k\in[0,\infty]\) and \(A_k\in\Sigma\). We then define

\[ \mu_0(f)=\sum_{k=1}^m a_k\mu(A_k)\leq\infty \]

Definition:

For \(f\in(\mathrm{m\Sigma})^+\), we define:

\[ \mu(f)\edef\sup\{ \mu_0(h)\ |\ h\in\mathrm{SF}^+,\ h\leq f \}\leq\infty \]

If \(f\) itself is simple, i.e. \(f\in\mathrm{SF}^+\), we have \(\mu(f)=\mu_0(f)\).

Definition:

For \(f\in \mathrm{m}\Sigma\), we write \(f=f^+-f^-\), where

\[ f^+(s)=\max(f(s),0),\quad f^-(s)=\max(-f(s),0) \]

Then, \(f^+,\ f^-\in(\mathrm{m}\Sigma)^+\), and \(|f|=f^+ + f^-\). Therefore, for \(f\in\mathrm{m}\Sigma\), we say that \(f\) is \(\mu\)-integrable, and write

\[ f \in \msc{L}^1(S,\Sigma,\mu) \]

if

\[ \mu(|f|)=\mu(f^+)+\mu(f^-)<\infty \]

And then, we define

\[ \int f\ \rmd\mu\edef\mu(f)\edef\mu(f^+)-\mu(f^-) \]

Note that for \(f\in\msc{L}^1(S,\Sigma,\mu)\),

\[ |\mu(f)|\leq\mu(|f|) \]

Here are some supplementary explanation of commonly used notations:

\[ \int_S f(s)\mu(\rmd s)=\int_S f\ \rmd\mu=\mu(f)\\ \int_A f(s)\mu(\rmd s)=\int_A f\ \rmd\mu=\mu(f|A)=\mu(fI_A) \]

A worth mentioning example is that, if

\[ A=\{ s\in S\ |\ f(s)\geq x \} \]

we may simply write

\[ \mu(f\ |\ f\geq x)=\mu(f|A) \]

Lemma (Linearity):

For \(\alpha,\beta\in\R\) and \(f,g\in\msc{L}^1(S,\Sigma,\mu)\),

\[ \alpha f+\beta g \in \msc{L}^1 (S,\Sigma,\mu)\\ \]

and

\[ \mu(\alpha f+\beta g)=\alpha\mu(f)+\beta\mu(g) \]

8. Convergence Theorems

Theorem (The Monotone-Convergence Theorem):

If \(\{f_n\}\) is a sequence of elements of \((\mathrm{m}\Sigma)^+\) such that \(f_n \uparrow f\), then

\[ \mu(f_n)\uparrow \mu(f) \leq \infty \Leftrightarrow \int_s f_n(s)\mu(\rmd s)\uparrow\int_s f(s)\mu(\rmd s) \]

Lemma (The Fatou Lemma for functions):

\[ \mu( \underset{n\to\infty}{\lim \inf}\ f_n ) \leq \underset{n\to\infty}{\lim \inf}\ \mu ( f_n )\\ \text{where} \quad \underset{n\to\infty}{\lim \inf}\ f_n=\sup_{k\geq 1}\inf_{n\geq k}f_n \]

Proof: Let

\[ \displaystyle g_k=\inf_{n\geq k} f_n \]

​ then we have

\[ \underset{n\to\infty}{\lim \inf} f_n=\uparrow\lim_{k\to\infty} g_k \\ \Leftrightarrow \quad \lim_{k\to\infty} g_k = \sup_{k\geq 1}g_k = \underset{n\to\infty}{\lim \inf} f_n \]

​ By the definition of \(g_k\), for \(n\geq k\),

\[ g_k\leq f_n \quad \Rightarrow \quad \mu(g_k)\leq\mu(f_n)\quad \Rightarrow\quad \mu(g_k)\leq\inf_{n\geq k}\mu(f_n) \]

​ Combine those results and the monotone-convergence theorem together:

\[ \mu\lr({\underset{n\to\infty}{\lim \inf} f_n})=\uparrow \lim_{k\to\infty}\mu(g_k) \leq\lim_{k\to\infty}\inf_{n\geq k}\mu(f_n)= \underset{n\to\infty}{\lim \inf}\ \mu ( f_n ) \]

Lemma (Reverse Fatou Lemma):

If \(\{ f_n \}\) is a sequence in \((\mathrm{m}\Sigma)^+\) such that for some \(g\) in \((\mathrm{m}\Sigma)^+\), we have \(f_n\leq g,\ \forall n\), and \(\mu(g) \leq \infty\), then:

\[ \mu(\underset{n\to\infty}{\lim\sup}\ f_n)=\underset{n\to\infty}{\lim\sup}\ \mu(f_n) \]

Theorem (The Dominated-Convergence Theorem):

Suppose that \(f_n,\ f\in\mathrm{m}\Sigma\), that \(f_n(s)\to f(s)\) for \(\mu\)-almost every \(s\in S\), and that the sequence \(\{ f_n \}\) is dominated by an non-negative integrable function \(g\in\msc{L}^1(S,\Sigma,\mu)^+:\)

\[ |f_n(s)| \leq g(s), \quad \forall s\in S,\ \forall n\in \N \]

where \(\mu(g)<\infty\). Then, we have conclusions that:

​ (i) \(f\) is \(\mu\)-integrable:

\[ f\in\msc{L}^1(S,\Sigma,\mu)\quad \Leftrightarrow\quad \mu(|f|)<\infty\\ \]

​ (ii) \(f_n\) converges to \(f\) in \(\msc{L}^1(S,\Sigma,\mu)\):

\[ \lim_{n\to\infty}\int_S(|f_n-f|)\ \rmd\mu=0 \]

​ (iii) The limits of integral of \(f_n\) converges to the integral of \(f\), i.e. the limit and integral operations can be exchanged:

\[ \mu(f_n)\to\mu(f)\quad\Leftrightarrow\quad \lim_{n\to\infty}\int_S f_n\ \rmd\mu=\int_S f\ \rmd\mu=\int_S\lr({\lim_{n\to\infty}f_n})\rmd\mu \]

​ **Proof: **

​ (i) \(f_n(s)\to f(s)\) for \(\mu\)-almost every \(s\in S\) \(\Rightarrow\) \(|f_n(s)|\to |f(s)|\) for \(\mu\)-almost every \(s\in S\). Therefore,

\[ \int_S |f|\ \rmd\mu=\int_S\lim_{n\to\infty}|f_n|\ \rmd\mu\leq\int_S |g|\ \rmd\mu<\infty \]

​ (ii) Let \(h_n = 2g - |f_n-f|\),

\[ \begin{aligned} &|f_n-f| \leq |f_n|+|f| \leq 2g\\ \Rightarrow\quad &h_n=2g-|f_n-f|\geq 0,\quad h_n\to2g\ \text{is a.e.}(\mu)\\ \end{aligned} \]

​ According to Fatou's Lemma,

\[ \begin{aligned} && \mu( \underset{n\to\infty}{\lim \inf}\ h_n ) &\leq \underset{n\to\infty}{\lim \inf}\ \mu ( h_n )\\ \Leftrightarrow \quad & & \mu(2g) &\leq \underset{n\to\infty}{\lim \inf}\ \mu(2g-|f_n-f|)\\ \Leftrightarrow \quad & & \mu(2g) &\leq \mu(2g)+\underset{n\to\infty}{\lim \inf}\ (-\mu(|f_n-f|))\\ \Leftrightarrow \quad & & 0&\leq -\underset{n\to\infty}{\lim \sup}\ \mu(|f_n-f|) \\ \Leftrightarrow \quad & & 0&\geq \underset{n\to\infty}{\lim \sup}\ \mu(|f_n-f|)\\ \Leftrightarrow \quad & & 0\leq \underset{n\to\infty}{\lim \inf}\ \mu(|f_n-f|)&\leq \underset{n\to\infty}{\lim \sup}\ \mu(|f_n-f|)\leq0\\ \Leftrightarrow \quad & & \lim_{n\to\infty}\mu(|f_n-f| & = 0 \end{aligned} \]

​ (iii) According to the result (ii), we have:

\[ \lim_{n\to\infty}|\mu(f_n)-\mu(f)|=\lim_{n\to\infty}|\mu(f_n-f)|\leq\lim_{n\to\infty}\mu(|f_n-f|)=0 \]

It is worth noting that this theorem is central to many applications of measure theory.

Lemma (Scheffé's Lemma):

Suppose that \(f_n,\ f\in\msc{L}^1(S,\Sigma,\mu)\) and that \(f_n\to f\) a.e.\((\mu)\). Then,

\[ \mu(|f_n-f|)\to 0\quad \Leftrightarrow \quad \mu(|f_n|)\to\mu(|f|) \]

9. The Radon-Nikodým Theorem

Definition:

Let \((S,\Sigma)\) be a measurable space. A function \(\nu : \Sigma \to \R\) is called a signed measure if: (i) \(\nu(\varnothing)=0\); (ii) It is countably additive. The difference between signed measure and ordinary measure is whether it can be negative.

Theorem (Hahn Decomposition Theorem):

Let \(\nu\) be a signed measure on a measure space \((S,\Sigma)\). Then there exists a partition \((P,N)\) of \(S\) such that:

\[ \begin{aligned} \text{(i)} & \quad P\cup N=S,\quad P\cap N=\varnothing\\ \text{(ii)} & \quad \forall A\sube P,\quad \nu(A) \geq 0\\ \text{(iii)} & \quad \forall B\sube N,\quad \nu(B) \leq 0 \end{aligned} \]

where both \(P\) and \(N\) are measurable sets. And we say that \(P\) is a positive set for \(\nu\), while \(N\) is a negative set for \(\nu\). This partition \((P,N)\) is called a Hahn decomposition, and it is unique up to a \(\nu\)-null set. (That is, the difference between two different Hahn decomposition \((P_1,N_1)\) and \((P_2,N_2)\) is merely a \(\nu\)-null set.)

Proposition:

Let \((S,\Sigma,\mu)\) be a measure space. If \(f\in(\mathrm{m}\Sigma)^+\), then, by linearity and the Monotone-Convergence Theorem, we consider a new measure \(f\mu\) on \((S,\Sigma)\),

\[ (f\mu)(F)\edef\mu(f\ |\ F)=\mu(f\cdot I_F),\quad \forall F\in\Sigma \]

And Note that

\[ \mu(F)=0 \quad \text{implies} \quad (f\mu)(F)=0 \]

Theorem (The Radon-Nikodým Theorem) :

Let \((S,\Sigma)\) be a measurable space, and let \(\mu\) and \(\lambda\) be \(\sigma\)-finite measures on \((S,\Sigma)\). Then the following statements are equivalent:

(i) for \(F\in\Sigma,\ \mu(F)=0\) implies that \(\lambda(F)=0\);

(ii) \(\lambda=f\mu\) for some \(f\in(\mathrm{m}\Sigma)^+\).

Definition:

We can say the pre-defined \(\lambda\) is absolutely continuous relative to \(\mu\), noted as \(\lambda\ll\mu\). And the function \(f\) is defined uniquely modulo \(\mu\)-null sets: we say that \(f\) is a version of the (Radon-Nikodým) density of \(\lambda\) relative to \(\mu\), and write:

\[ f=\frac{\rmd\lambda}{\rmd\mu}\quad \text{a.e.}(\mu) \]

Proof of Radon-Nikodým Theorem: (i) \(\Leftarrow\) (ii): Our condition (ii) can be translated as: \(\exist f\in (\mathrm{m}\Sigma)^+\) such that:

\[ \lambda(A)=\int_A f\ \rmd\mu,\quad \forall A \in \Sigma \]

​ Let \(F\) be a set such that \(\mu(F)=0\). Since the definition of integral is an approximation of simple functions, all the terms \(a_k\mu(F_k)\) should be \(0\), thus

\[ \lambda(F)=\int_F f\ \rmd\mu= 0 \]

​ Note that it is not a rigorous statement if we simply say \(\lambda(F)=0\) because \(\mu(\rmd s)=0,\ \forall\rmd s\in F\).

​ (i) \(\Rightarrow\) (ii): This part is rather complicated. Let's put it by three steps:

Step 1: Constructing the function \(f\) under finite measure: We may first consider the case when \(\lambda\) and \(\mu\) are finite, since the latter one is a weaker condition than the former one. For any \(r\in\Q\), we define a signed measure

\[ \nu_r(A) = \lambda(A)-r\mu(A),\quad \forall A\in \Sigma \]

​ It is obvious that \(\nu_r\) is a finite signed measure. Applying Hahn decomposition theorem, for any \(v_r\), we have a unique partition \((S_r^+,S_r^-)\) such that \(\nu_r(A)\geq 0,\ \forall A\sube S_r^+\) and \(\nu_r(A)\leq 0,\ \forall A\sube S_r^-\). Therefore we let \(f : S \to \R\) be a function such that

\[ f(s)=\sup\{ r \in \Q \ | \ s \in S_r^+ \} \]

​ Thus \(f\) is a non-negative measurable function. The intuition or motivation is that, we want to find the largest \(r\) such that \(\lambda (s)\geq r \mu(s)\) for any given \(s\in S\), through which \(\lambda(s)\) almost equals \(r\mu(s)\). It is worthy noting that rational will not be a barrier towards our aim, since the supremum or infimum of a rational sequence can be irrational.

Step 2: Proving the function \(f\) satisfies our aim. We may first prove \(\int_A f\ \rmd\mu\leq\lambda(A)\). \(\forall A \sube \Sigma\), we define

\[ g_r(s)=r\cdot I_{A \cap S_r^+}(s) \]

​ which is a simple function. For any \(E \sube S_r^+\), we have

\[ \nu_r(E)=\lambda(E)- r\mu(E)\geq 0\\ \Rightarrow\quad \lambda(A \cap S_r^+)\geq r \mu(A \cap S_r^+) =\int_{A \cap S_r^+} r\ \rmd\mu =\int_A g_r\ \rmd \mu \]

​ Now we take supremum on both sides by \(r\). That is, finding the largest \(r\) such that the inequality holds, i.e. \(f(s)\).

\[ \begin{aligned} \lambda(A)\geq\lambda(A\cap S_r^+)\geq \int_A f\ \rmd\mu \end{aligned} \]

​ Done! Let's proceed our proof. Consider another signed measure

\[ \tilde{\nu}(A)=\lambda(A)-\int_A f\ \rmd\mu,\ \forall A \in \Sigma \]

​ We have just proven that \(\tilde\nu\geq0\). Let \(F\in\Sigma\) such that \(\mu(F)=0\), obviously \(\tilde\nu(F)=0\). Next, we shall proceed by contradiction. Suppose that \(\tilde\nu\) is not a zero measure, that is, \(\exist\ G \in \Sigma\) such that \(\tilde\nu(G)>0\). Therefore we have

\[ \lambda(G)>\int_{G} f \ \rmd \mu \]

​ Since rational numbers are dense, there must be an \(r\in\Q\) such that

\[ \lambda(G) > r\mu(G) > \int_{G} f\ \rmd \mu \]

​ which indicates that for this given \(r\), we can find a set \(E\sube G\) such that \(\mu(E)>0\) and that \(f(s)<r,\ \forall s\in E\), otherwise \(\int_G f\ \rmd\mu\) will definitely \(\geq r\mu(G)\). Thus any element in \(E\) cannot be in \(S_r^+\) because \(\forall s\in S_r^+,\ f(s)\geq r\) by definition. Therefore \(E\sube S_r^-\). However,

\[ \nu_r(E)=\lambda(E)- r \mu(E)>0 \]

​ which shows that \(E\sube S_r^+\), indicating that our assumption "\(\tilde\nu\) is not a zero measure" is false. Therefore, we have the result that

\[ \tilde{\nu}(A)=\lambda(A)-\int_A f\ \rmd\mu=0,\quad \forall A \in \Sigma\\ \Leftrightarrow\quad \lambda(A)=\int_A f\ \rmd \mu,\quad \forall A\in\Sigma \]

Step 3: Extension to \(\sigma\)-finite measure. If \(\lambda\) and \(\mu\) are \(\sigma\)-finite, we can decompose \(S\) into countable disjoint measurable sets \(\{ E_k \}_{k=1}^\infty\) such that $\mu(E_k)<\infty $ and \(\lambda(E_k)<\infty\) for any \(E_k\). Then, for any measurable subspace \((E_k, \Sigma_k)\), \(\mu\) and \(\lambda\) are finite. Apply the same process of Step 1, we find a sequence of functions \(\{f_k\}_{k=1}^\infty\) such that

\[ \lambda(A\cap E_k)=\int_{A\cap E_k} f_k\ \rmd\mu,\quad \forall A\in\Sigma.\ \forall k \]

​ Let \(f\) be the sum of \(f_k\)

\[ f(s)=\sum_{k=1}^\infty f_k(s)\cdot I_{E_k}(s) \]

​ Then, combining with the disjoint property of \(\{ E_k\}_{k=1}^\infty\) and \(\sigma\)-additivity of \(\lambda\),

$$ \begin{aligned} \lambda(A)\ &=\sum_{k=1}^\infty\lambda(A\cap E_k)=\sum_{k=1}^\infty\int_{A\cap E_k} f_k\ \rmd\mu\ &=\sum_{k=1}^\infty\int_{A} f_k\cdot I_{E_k} \rmd\mu =\int_{A}\sum_{k=1}^\infty f_k\cdot I_{E_k} \rmd\mu=\int_A f\ \rmd \mu

\end{aligned} $$

​ That is what we want.

**Lemma: **

Suppose that \(\lambda\) and \(\mu\) are finite measures on \((S,\Sigma)\). Then, \(\lambda \ll \mu\) if and only if for any \(\ve>0\) we can find a \(\delta>0\) such that

\[ F\in\Sigma\ \ \text{and}\ \ \mu(F)<\delta\ \ \text{imply that}\ \ \lambda(F)<\ve \]

Proof: For the "if" part, consider a set \(A\in\Sigma\) such that \(\mu(A)=0\). If \(\lambda(A)\neq 0\), suppose that \(\lambda(A)=a\). Then, we can always find a \(\ve_0={a}/2\), for any \(\delta>0\), \(\mu(A)=0<\delta\), while \(\lambda(A)>\ve_0\). Therefore \(\lambda(A)\) has to be \(0\).

​ For the "only if" part, we continue to prove this by contradiction. If \(\lambda\ll\mu\), and if there exists an \(\ve_0>0\), for all \(\delta>0\), there is a set \(A_\delta\in\Sigma\) such that \(\mu(A_\delta)<\delta\) and \(\lambda(A_{\delta})\geq \ve_0\). We can construct a sequence \(\{ \delta_n \}\) such that \(\delta_n=1/2^n\). For \(\delta_ns>0\), therefore we can find a set \(A_n\) such that \(\mu(A_n)<\delta_n={1}/{2^n}\). Let \(A\) be the limit superior of \(\{A_n\}\):

\[ A=\underset{n\to\infty}{\lim\sup}\ A_n=\bigcap_{k=1}^\infty \bigcup_{n=k}^\infty A_n\\ \Rightarrow\quad \mu(A)=\lim_{k\to\infty}\mu\lr({\bigcup_{n=k}^\infty A_n}) \leq\lim_{k\to\infty}\sum_{n=k}^\infty\mu(A_n)<\lim_{k\to\infty}\frac{1}{2^k-1}=0 \]

​ Thus we have \(\mu(A)=0\), implying that \(\lambda(A)\) has to be \(0\) since \(\lambda\ll\mu\). However,

\[ \lambda(A)=\lim_{k\to\infty}\lambda\lr({\bigcup_{n=k}^\infty A_n})\geq \lambda(A_k)\geq\ve_0 \]

​ which is contradict to \(\lambda(A)=0\), hence the false original assumption.

Definition:

Suppose that \(\lambda\) and \(\mu\) are finite measures on \((S,\Sigma)\). And suppose further that \(\lambda\ll\mu\) and \(\mu\ll\lambda\). We then say that \(\lambda\) and \(\mu\) are equivalent. Note that a.e.\((\mu)\) and a.e.\((\lambda)\) now mean the same thing, and we simply write a.e., for they share the same null measure set.

Lemma:

Let \(\lambda\) and \(\mu\) be a pair of equivalent measures on \((S,\Sigma)\). Then, if \(f\) is a version of \(\rmd\lambda/\rmd\mu\) and \(g\) is a version of \(\rmd\mu/\rmd\lambda\), we have \(0<f<\infty\) a.e. and \(g=1/f\) a.e..

​ **Proof: **

​ For the \(f>0\) part, assume that there exist a set \(F\in\Sigma\) such that \(f(t)=0,\ \forall t\in F\) and that \(\mu(F)>0\). Then, \(\lambda(F)=\int_F f(t)\ \rmd\mu=0\), implying that \(\mu(F)=0\), which is contrary to the origin condition that \(\mu(F)>0\). For the \(f<\infty\) part, assume that there exist a set \(G\in\Sigma\) such that \(f(s)=\infty,\ \forall s\in G\) and that \(\mu(G)=\ve>0\). Then, \(\lambda(G)=\int_G f(s)\ \rmd\mu=\infty\cdot\mu(G)=\infty\cdot\ve=\infty\).

​ However, this is not a solid proof. The definition of \(\sigma\)-finite requires us to consider the sequence \(\{S_n\}\) such that \(\bigcup S_n=S\) and that \(\mu(S_n)<\infty\), and another sequence \(\{T_n\}\) such that \(\bigcup T_n=S\) and that \(\lambda(T_n)<\infty\). Thus we let \(G_{ij}=S_i\cap T_j\), then \(S=\bigcup G_{ij}\). Now back to the set \(G\), We have shown that \(\mu(G)>0\), therefore there must be a \(G_{i_0,j_0}\) such that \(\mu(G\cap G_{i_0,j_0})>0\). Let \(H=G\cap G_{i_0,j_0}\), Then \(\lambda(H)=\int_{H}f\ \rmd\mu=\infty\cdot\mu(H)=\infty\). Therefore we have the result contrary to the \(\sigma\)-finite condition:

\[ \lambda(T_{j_0})\geq\lambda(S_{i_0}\cap T_{j_0})=\lambda(G_{i_0,j_0})\geq\lambda(G\cap G_{i_0,j_0})=\lambda(H)=\infty \]

​ Now consider the proposition that \(g=1/f\). \(\forall A\in\Sigma,\ \lambda(A)=\int_A f\ \rmd\mu\), thus \(\rmd\lambda=f\ \rmd\mu\). Substitute this result into \(\mu(A)=\int_A g\ \rmd\lambda\):

\[ \mu(A)=\int_A g\ \rmd\lambda=\int_A gf\ \rmd \mu\quad \Rightarrow \quad \int_A(1-gf)\ \rmd\mu=0\ \Leftrightarrow \mu[(1-gf)\cdot I_A]=0 \]

​ That is, the measure \(\mu(1-gf)=0\) holds on every element in \(\Sigma\), which implies that \(1-gf=0\) a.e.. Since we have proven that \(f>0\), we have the result \(g=1/f\).

10. \(\msc{L}^p\) and \(L^p\) spaces

Definition:

Let \((S,\Sigma,\mu)\) be a measure space, and let \(p\in[1,\infty)\). For \(f\in\text{m}\Sigma\), write \(f\in \msc{L}^p:= \msc{L}^p(S,\Sigma,\mu)\) if

\[ ||f||_p\edef (\ \mu(|\ f\ |^p)\ )^{\frac{1}{p}}<\infty \]

**Lemma (Minkowski's Inequality): **

\[ ||\ f + g\ ||_p \leq ||f||_p + ||g||_p \]

**Lemma (Hölder's Inequality): **

If \(p > 1\) and \(q > 1\) satisfy \(p^{-1} + q^{-1} = 1\), then for \(f,g\in\text{m}\Sigma\),

\[ |\mu(fg)| \leq \mu(|fg|) \leq ||f||_p \cdot ||g||_q \]

The Schwarz inequality is the case when \(p=q=2\). These inequalities can be proved by Jensen's Inequality. And further, also by applying Jensen's Inequality, we have:

Proposition:

If \(\mu\) is a finite measure and \(1 \leq p \leq r\), then for \(f\in\text{m}\Sigma\),

\[ ||f||_p \leq \mu(S)^c\cdot ||f||_r \]

Definition:

We define an equivalence relation on \(\msc{L}^p\) as follows:

\[ \begin{aligned} &f \equiv g\quad \text{if and only if}\quad ||\ f-g\ ||_p=0\\ \Leftrightarrow\quad &f \equiv g\quad \text{if and only if}\quad f=g \quad \text{a.e.}(\mu) \end{aligned} \]

Let \([f]\) be the equivalence class in \(\msc{L}^p\) of \(f\), we can pick a representative \(f^*\) in \([f]\) such that \(f^*<\infty\), since \(\mu(|f|=\infty)=0\). Thus we define

\[ \alpha[f]+\beta[g]=[\alpha f^*+\beta g^*],\quad ||\ [f]\ ||_p := ||f||_p \]

If we further let \(L^p\) be the set of equivalence classes in \(\msc{L}^p\), it will be easy to find that \(L^p\) is a normed vector space. And moreover it is a Banach space, in particular a complete metric space, under the distance

\[ d([f],[g])=||\ [f-g]\ ||_p \]

Definition and Lemma (Riesz Representation Theorem for \(L^{p}\) spaces):

For \(p>1\), we say that the dual space \((L^p)^{*}\) is the space \(L^q\) where \(p^{-1}+q^{-1}=1\). If \(\Lambda\) is a bounded linear function on \(L^p\), then there exists \(g\in L^q\) such that

\[ \Lambda(f)=\mu(fg),\quad \forall f\in L^p \]

Moreover, \(||g||_q = || \Lambda ||_p\), implying that \(L^p\) and \(L^q\) is a pair of isometric isomorphism

​ **Proof: ** We start from simple functions. Let \(f=I_A\).

​ By intuition, if we can find a function \(g\) such that \(\Lambda(I_A)=\mu(I_A\cdot g)=\int_A g\ \rmd\mu\), \(g\) may takes the form of \(g=\rmd\Lambda/\rmd\mu\). Since \(\Lambda\) is not a measure, there needs a new measure \(\lambda\) serving as the bridge between \(\Lambda\) and \(g\). Let

\[ \lambda(A)=\Lambda(I_A),\quad \forall A\in\Sigma \]

​ If \(\lambda\) is indeed a measure (or signed measure) on \((S,\Sigma)\), and \(\lambda\ll\mu\), then the function \(g\) truly exists according to the Radon-Nikodým Theorem. Since \(\Lambda\) is linear, \(\nu\) is \(\sigma\)-additive. And we say \(\Lambda\) is finite if

\[ \exist\ C \geq 0\quad \text{s.t.}\quad |\Lambda(f)|\leq C\cdot ||f||_p\ ,\quad \forall f\in L^p \]

​ And we note \(||\Lambda||\) as the minimum constant \(C\), that is,

\[ ||\Lambda|| = \sup_{f\in L^p,f\neq 0}\frac{|\Lambda(f)|}{||f||_p}\\ \Rightarrow \quad |\Lambda(f)|\leq ||\Lambda||\cdot ||f||_p\ ,\quad \forall f\in L^p \]

​ Therefore \(\lambda\) can be proved to be finite

\[ |\lambda(A)| = |\Lambda(I_A)| \leq || \Lambda ||\cdot || I_A ||_p = ||\Lambda||\cdot(\mu(A))^{1/p}<\infty,\quad \forall A\in\Sigma \]

​ If \(\mu(F)=0\), the indicator function \(I_F\) will be a zero vector in \(L^p\) space since \(||I_F||_p = 0\). Thus \(\lambda(F)=\Lambda(I_F)=0\), which is obvious because of the linearity of \(\Lambda\). Therefore \(\lambda\ll\mu\), showing that there exist a function \(g\in(\text{m}\Sigma)^+\) such that \(\lambda=g\mu\).

​ For the more complicated \(f\) taking the form \(f=\sum_{i=1}^{\infty} c_i I_{A_i}\),

\[ \Lambda(f) = \Lambda\lr({ \sum_{i=1}^\infty c_i I_{A_i} }) =\sum_{i=1}^\infty c_i\cdot\Lambda(I_{A_i}) =\sum_{i=1}^\infty c_i \int_{A_i} g\ \rmd\mu =\int_{A_i}\lr({ \sum_{i=1}^\infty c_i\cdot I_{A_i}})\rmd\mu=\mu(fg) \]

​ Further, when \(f\) is not a simple function, we can always fine a sequence of functions \(\{ f_n \}\in L^p\) such that \(||\ f-f_n\ ||_p\to 0\) since simple functions are dense in \(L^p\) space, which is a basic idea in functional analysis. Therefore

\[ \Lambda(f)=\lim_{n\to\infty}\Lambda(f_n) \]

​ Note that \(\Lambda\) is continuous because of it is a bounded and linear function. And \(\mu(fg)\) can be proved to be continuous by Hölder's Inequality

\[ |\ \mu(fg) - \mu(f_ng) \ | = |\ \mu((f-s_n)g)\ | \leq ||\ f-f_n\ ||_p \cdot ||g||_q \overset{n\to\infty}\longrightarrow 0 \]

​ Thus we have \(\Lambda(f)=\mu(fg)\). It is not over, however, for that we have to prove that \(g\in L^q\). Here we provide a stronger conclusion containing the proposition that \(||g||_q=||\Lambda||:= ||\Lambda||_p\). Still applying Hölder's Inequality:

\[ ||\Lambda|| = \sup_{f\in L^p,f\neq 0}\frac{|\Lambda(f)|}{||f||_p} = \sup_{f\in L^p,f\neq 0}\frac{|\mu(fg)|}{||f||_p}\leq \sup_{f\in L^p,f\neq 0}\frac{|\ ||f||_p \cdot ||g||_q\ |}{||f||_p}= || g ||_q \]

​ The other side is rather complicated, requiring some technique of construction, however. Here we neglect the rigorousness, claiming that \(||\Lambda||\geq ||g||_q\) is true. Thus \(||\Lambda||=|| g ||_q\).

Definition:

Additionally, when \(p=1\) and \(q=\infty\), we say that \(f\in\msc{L}^\infty\) if the \(\mu\)-essential supremum norm of \(f\) is finite:

\[ \begin{aligned} ||f||_\infty \edef \mu\text{-ess sup}(f) & \edef \sup\{ x \geq 0\ |\ \mu(|f| \geq x)>0 \}\\ & \edef \inf \{ x \geq 0\ |\ \mu(|f| \geq x)=0 \}<\infty \end{aligned} \]

And note that although \((L^1)^*=L^\infty\), but, except in trivial cases, \((L^\infty)^*\) will be much bigger than \(L^1\).

Theorem (Hahn-Banach Theorem):

Let \((S,\Sigma,\mu)\) be a measure space, let \(p\in[1,\infty)\), and let \(V\) be a vector subspace of \(L^p\). Define \(q\in[1,\infty)\) by \(p^{-1}+q^{-1}=1\). Suppose that, for \(g\in L^q\),

\[ \mu(fg)=0,\ \forall f\in V \quad \text{implies that}\quad g=0 \]

Then \(V\) is dense in \(L^p\).

Corollary:

Let \((S,\Sigma,\mu)\) be a finite measure space, and let \(\msc{I}\) be a \(\pi\)-system on \(S\) such that \(\sigma(\msc{I})=\Sigma\). Let \(p\in[1,\infty)\), and \(V\) be the vector subspace of \(L^p\) spanned by the indicator functions of elements of \(\msc{I}\). Then \(V\) is dense in \(L^p\).

11. Product \(\sigma\)-algebras

Definiton:

Let \((S_1,\Sigma_1)\) and \((S_2,\Sigma_2)\) be measurable spaces. Let \(S\) denote the Cartesian product \(S=S_1 \times S_2\). For \(i=1,2\), let \(\rho_i\) denote the \(i\)-th coordinate map, so that

\[ \rho_1(s_1,s_2)=s_1,\quad \rho_2(s_1,s_2)=s_2 \]

For the \(\sigma\)-algebra \(\Sigma_1\) and \(\Sigma_2\), simple Cartesian product is not enough. We define \(\Sigma=\Sigma_1\times\Sigma_2\) as the form

\[ \Sigma=\sigma(\rho_1,\rho_2) \]

Thus \(\Sigma\) is generated by sets of the form

\[ {\rho_1}^{-1}(B_1)=B_1\times S_2,\quad B_1\in\Sigma_1\\ {\rho_1}^{-1}(B_2)=B_2\times S_1,\quad B_2\in\Sigma_2 \]

Corollary:

By the definitions above, we have

\[ (B_1 \times S_2) \cap (S_1 \times B_2) = B_1 \cap B_2 \]

and

\[ \msc{I} \edef \{ B_1 \times B_2 \ |\ B_i\in\Sigma_i \} \]

is a \(\pi\)-system generating \(\Sigma = \Sigma_1 \times \Sigma_2\).

Lemma:

Let \(\msc{H}\) denote the class of functions \(f : S \to \R\) that are in \(\text{b}\Sigma\) and that are such that:

  1. for each \(s_1\in S_1\), the map \(s_2\mapsto f(s_1,s_2)\) is \(\Sigma_2\)-measurable on \(S_2\)
  2. for each \(s_2\in S_2\), the map \(s_1\mapsto f(s_1,s_2)\) is \(\Sigma_1\)-measurable on \(S_1\)

Then \(\msc{H}=\text{b}\Sigma\).

12. Product Measure, Fubini's Theorem

Definition:

Suppose that for \(i=1,2\), \(\mu_i\) is a finite measure on \((S_i,\Sigma_i)\). We define the integrals:

\[ I_1^f(s_1)\edef\int_{S_2}f(s_1,s_2)\mu_2(\rmd s_2), \quad I_2^f(s_2)\edef\int_{S_1}f(s_1,s_2)\mu_1(\rmd s_1) \]

Lemma:

Let \(\msc{H}\) be the class of elements in \(\text{b}\Sigma\) such that the following property holds:

\[ I_1^f(\cdot)\in \text{b}\Sigma_1\quad \text{and}\quad I_2^f(\cdot)\in \text{b}\Sigma_2\quad \text{and}\quad \int_{S_1}I_1^f(s_1)\mu_1(\rmd s_1)=\int_{S_2}I_2^f(s_2)\mu_2(\rmd s_2) \]

Then \(\msc{H}=\text{b}\Sigma\).

Thus, for \(F\in\Sigma\) with indicator function \(f = I_F\), we define

\[ \mu(F) = \int_{S_1}I_1^f(s_1)\mu_1(\rmd s_1)=\int_{S_2}I_2^f(s_2)\mu_2(\rmd s_2) \]

Theorem (Fubini's Theorem; Product Measure):

The set function \(\mu\) is a measure on \((S,\Sigma)\) called the product measure of \(\mu_1\) and \(\mu_2\), and we write \(\mu=\mu_1 \times \mu_2\) and

\[ (S,\Sigma,\mu)=(S_1,\Sigma_1,\mu_1) \times (S_2,\Sigma_2,\mu_2). \]

Moreover, \(\mu\) is the unique measure on \((S,\Sigma)\) for which

\[ \mu(A_1 \times A_2) = \mu_1(A_1)\mu_2(A_2),\quad A_i\in\Sigma_i \]

If \(f\in \text{m}\Sigma\), and \(\mu(|f|)<\infty\), then we have

\[ \mu(f)=\int_S f\ \rmd\mu = \int_{S_1}I_1^f(s_1)\mu_1(\rmd s_1)=\int_{S_2}I_2^f(s_2)\mu_2(\rmd s_2) \]