\[
\newcommand{\bs}{\boldsymbol}
\newcommand{\bsX}{\boldsymbol{X}}
\newcommand{\bf}{\mathbf}
\newcommand{\msc}{\mathscr}
\newcommand{\mca}{\mathcal}
\newcommand{\T}{\text{T}}
\newcommand{\rme}{\mathrm{e}}
\newcommand{\rmi}{\mathrm{i}}
\newcommand{\rmj}{\mathrm{j}}
\newcommand{\rmd}{\mathrm{d}}
\newcommand{\rmm}{\mathrm{m}}
\newcommand{\rmb}{\mathrm{b}}
\newcommand{\and}{\land}
\newcommand{\or}{\lor}
\newcommand{\exist}{\exists}
\newcommand{\sube}{\subseteq}
\newcommand{\lr}[3]{\left#1 #2 \right#3}
\newcommand{\intfy}{\int_{-\infty}^{+\infty}}
\newcommand{\sumfy}[1]{\sum_{#1=-\infty}^{+\infty}}
\newcommand{\vt}{\vartheta}
\newcommand{\ve}{\varepsilon}
\newcommand{\vp}{\varphi}
\newcommand{\Var}{\text{Var}}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\edef}{\xlongequal{def}}
\newcommand{\prob}{\text{P}}
\newcommand{\Exp}{\text{E}}
\newcommand{\t}[1]{\text#1}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\versionofnewcommand}{\text{260125}}
\]
Gaussian Processes
Definition of Gaussian Processes
首先定义高斯过程. 此处定义由 Diffusions, Markov Processes, and Martingales (Rogers & Williams) 给出.
In complete generality, a (real-valued) process \((\bsX_t)_{t\in T}\) indexed by some set \(T\) is said to be a Gaussian process if, for any \(t_1,\cdots,t_n\in T\), the law of \((\bsX(t_1),\cdots,\bsX(t_n))\) is multivariate Gaussian. Thus the law of the process \(\bsX\) is specified by the functions:
\[
\mu(t)\overset{def}=E(\bsX_t),\quad \rho(s,t)\overset{def}=\Cov (\bsX_s,\bsX_t)
\]
对于离散的情况, 我们可以用矩阵来表示均值与协方差. 设 \(\bsX=(\bsX(t_1),\cdots,\bsX(t_n))^\T\) 是 \(n\) 维随机向量, 那么:
\[
\begin{aligned}
&\bsX\sim N(\bs{\mu},\bs{\Sigma}),\ \text{where}\\
&\bs{\mu}=(\mu_1,\cdots,\mu_n)^\T,
\quad \bs\Sigma=E(\bsX-\bs\mu)(\bsX-\bs\mu)^\T
\end{aligned}
\]
先直接给出其概率密度 \(f_\bsX(x)\) 的表达式:
\[
\boxed{f_\bsX(x)=\frac{1}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\cdot
\exp\lr({ -\frac{1}{2}(\bsX-\bs\mu)^\T\bs\Sigma^{-1}(\bsX-\bs\mu)
})}
\]
其实形式没有很复杂. 最难处理也最重要的就是指数上的二次型. 首先要验证, \(f_\bsX(x)\) 是个概率密度. 因为 \(\bs\Sigma\) 正定, 因此 \(\det\bs\Sigma\) 肯定是正的, 指数也是正的, 从而 \(f_\bsX(x)\geq0\) 是对的. 接下来验证归一化: 我们希望二次型中间的矩阵是对角阵, 这样就能大大简化计算. 事实也确实如此, 因为 \(\bs\Sigma\) 是实对称的正定矩阵, 因此可以作特征分解:
\[
\begin{aligned}
\bs\Sigma&=\bf{U}^\T\bs\Lambda\bf{U},
\quad &\bs\Lambda&=\text{diag}({\sigma_1}^2,\cdots,{\sigma_n}^2)\\
\Leftrightarrow \bs\Sigma^{-1}&= \bf{U}^\T\bs\Lambda^{-1}\bf{U},
\quad &\bs\Lambda^{-1}&=\text{diag}
(\frac{1}{{\sigma_1}^2},\cdots,\frac{1}{{\sigma_n}^2})
\end{aligned}
\]
我管你 \(\bf{U}\) 是什么呢, 统统打包换元, \(\bs{y}=\bf{U}(\bs{x}-\bs\mu)\Leftrightarrow \bs{x}=\bf{U}^{\T}\bs{y}+\bs\mu\). 由于正交阵的行列式绝对值为 \(1\), 积分换元时的 Jacobian 矩阵行列式绝对值为 \(1\). 从而:
$$
\begin{aligned}
\int_{\R^n}f_\bsX(\bs{x})\ \rmd\bs{x}
=&\ \frac{1}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\cdot \int_{\R^n}
\exp\lr({ -\frac{1}{2}(\bsX-\bs\mu)^\T\bs\Sigma^{-1}(\bsX-\bs\mu)})\rmd\bs{x}\
=&\ \frac{1}{(2\pi)^{n/2}\cdot\prod_{k=1}^n{\sigma_k}}\cdot \int_{\R^n}
\exp\lr({-\frac{1}{2}\bs{y}^\T\bs\Lambda^{-1}\bs{y}})\ \rmd\bs{y}\
=&\ \frac{1}{(2\pi)^{n/2}\cdot\prod_{k=1}^n{\sigma_k}}\cdot \int_{\R^n}
\exp\lr({\sum_{k=1}^n \lr({-\frac{{y_k}^2}{2{\sigma_k}^2}})})\rmd\bs{y}\
=&\ \frac{1}{(2\pi)^{n/2}\cdot\prod_{k=1}^n{\sigma_k}}\cdot\prod_{k=1}^n
\lr({\intfy\exp\lr({-\frac{{y_k}^2}{2{\sigma_k}^2}})\rmd y_k})\
=&\ \prod_{k=1}^n\lr({ \intfy\frac{1}{\sqrt{2\pi}\cdot{\sigma_k}}
\exp\lr({-\frac{{y_k}^2}{2{\sigma_k}^2}})\rmd y_k})\
=&\ 1^n=1
\end{aligned}
$$
Characteristic Functions of Gaussian Processes
定义 \(n\) 维随机向量的特征函数:
\[
\phi_\bsX(\bs{\omega})=E(\exp(\rmj\bs\omega^\T\bsX))
\]
当 \(\bsX\) 是高斯过程时,
\[
\begin{aligned}
\phi_\bsX(\bs\omega)=&\
\frac{1}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\cdot \int_{\R^n}
\exp(\rmj\bs\omega^\T\bs{x})
\exp\lr({ -\frac{1}{2}(\bs{x}-\bs\mu)^\T\bs\Sigma^{-1}(\bs{x}-\bs\mu)
})\rmd\bs{x}\\
=&\ \frac{1}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\cdot \int_{\R^n}
\exp\lr({\rmj\bs\omega^\T\bs{x} -\frac{1}{2}(\bs{x}-\bs\mu)^\T\bs\Sigma^{-1}(\bs{x}-\bs\mu)
})\rmd\bs{x}
\end{aligned}
\]
看这个形式么, 要是能把这个积分式写成高斯的形式就好了. 我有一计, 可以配方! 配出来剩下那一坨常数项可以直接丢到积分外面去, 变成常数乘高斯分布的形式, 这样就简单多了. 为了配方方便, 我们可以从一维的对应形式研究:
\[
\rmj\omega x-\frac{1}{2\sigma^2}(x-\mu)^2=-\frac{1}{2\sigma^2}(x-(\mu+\rmj\sigma^2\omega))^2+\rmj\omega\mu-\frac{1}{2}\sigma^2\omega^2
\]
把这个熟悉的语言再变为线性代数的形式:
\[
-\frac{1}{2}
(\bs{x}-(\bs\mu+\rmj\bs\omega))^\T
\bs\Sigma^{-1}
(\bs{x}-(\bs\mu+\rmj\bs\omega))
+\rmj\bs\omega^\T\bs{\mu}
-\frac{1}{2}\bs\omega^\T\bs\Sigma\bs\omega
\]
这个配方后的结果代入 \(\phi_\bsX(\bs\omega)\) 的表达式中:
\[
\begin{aligned}
\phi_\bsX(\bs\omega)
=&\ \frac{1}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\cdot \int_{\R^n}
\exp\lr({\rmj\bs\omega^\T\bs{x}
-\frac{1}{2}(\bs{x}-\bs\mu)^\T\bs\Sigma^{-1}(\bs{x}-\bs\mu)})
\rmd\bs{x}\\
=&\ \lr({\exp\lr({\rmj\bs{\omega}^{\T}\bs\mu
-\frac{1}{2}\bs{\omega}^\T\bs\Sigma\bs\omega})})
\int_{\R^n}\frac{\exp(-\frac{1}{2}
(\bs{x}-(\bs\mu+\rmj\bs\omega))^\T
\bs\Sigma^{-1}
(\bs{x}-(\bs\mu+\rmj\bs\omega)))}{(2\pi)^{n/2}\cdot(\det\bs\Sigma)^{1/2}}\ \rmd\bs{x}
\end{aligned}
\]
积分项仍然是一个对高斯分布做积分, 因此等于 \(1\). 于是有高斯过程的特征函数.
\[
\boxed{\phi_\bsX(\bs\omega)=\exp\lr({\rmj\bs{\omega}^{\T}\bs\mu
-\frac{1}{2}\bs{\omega}^\T\bs\Sigma\bs\omega})}
\]
Linearity of Gaussian Processes
Theorem (Linearity of Gaussian Processes):
\[
\begin{aligned}
\text{Let }& \bsX\in\R^n,\ \bsX\sim N(\bs\mu,\bs\Sigma).\\
\text{For any }& \bf{A}\in\R^{m\times n},\ \bs{Y}= \bf{A}\bsX\in\R^m\\
\Rightarrow \quad &\bs{Y}\sim N(\bf{A}\bs\mu,\bf{A}\bs\Sigma\bf{A}^\T).
\end{aligned}
\]
接下来的证明, 核心思想是 "以貌取人", 它也许不严谨, 但它好使.
\[
\begin{aligned}
\phi_\bs{Y}(\bs\omega)=&\ E(\exp(\rmj\bs\omega^\T\bs{Y}))
=E(\exp(\rmj\bs\omega^\T(\bf{A}\bs{X})))\\
=&\ E(\exp(\rmj(\bs\omega^\T\bf{A})\bs{X}))
=E(\exp(\rmj(\bf{A}^\T\bs\omega)^\T\bs{X}))\\
=&\ \phi_\bsX(\bf{A}^\T\bs\omega)\\
=&\ \exp\lr({\rmj\bs(\bf{A}^\T\bs\omega)^\T\bs\mu
-\frac{1}{2}\bs(\bf{A}^\T\bs\omega)^\T\bs\Sigma(\bf{A}^\T\bs\omega)})\\
=&\ \exp\lr({\rmj\bs\omega^\T(\bf{A}\bs\mu)
-\frac{1}{2}\bs\omega^\T(\bf{A}\bs\Sigma\bf{A}^\T)\bs\omega)})\\
\end{aligned}
\]
由于特征函数与分布函数互为傅里叶变换对, 因此我们有把握认为 \(\bs{Y}\sim N(\bf{A}\bs\mu,\bf{A}\bs\Sigma\bf{A}^\T)\).
从而有一个推论: 高斯过程的边界过程一定是高斯的. 只需要构造一个对角矩阵 \(\bf{A}\), 把被边界住的分量取 \(0\). 然而, 反过来却不一定. 这里给出两种反过来成立的条件:
- \(\bsX_1,\cdots,\bsX_n\) 相互独立, \(\bsX_k\sim N(\mu_k,{\sigma_k}^2)\),
则 \(\bsX\sim N(\bs\mu,\bs\Sigma)\), \(\bs\mu=(\mu_1,\cdots,\mu_n)^\T\), \(\bs\Sigma=\text{diag}({\sigma_1}^2,\cdots,{\sigma_n}^2)\). (充分条件)
- \(\bsX=(\bsX_1,\cdots,\bsX_n)^\T\sim N(\bs\mu,\bs\Sigma)\ \Leftrightarrow\ \forall\ \bs\alpha\in\R^n,\ \bs\alpha^\T\bsX\sim N(\mu,\sigma^2)\).
即联合分布是高斯等价于分量的线性组合为一阶高斯. 注意这是个等价条件.
我们只需证明第二个条件的充分性, 其他都是显然的. 首先, 写出 \(\bs{y}=\bs\alpha^\T\bsX\) 的特征函数:
\[
\begin{aligned}
\phi_{\bs{y}}(\bs\omega)=&\ E(\exp(\rmj\bs\omega^\T\bs\alpha^\T\bsX))\\
\end{aligned}
\]
我们想用它来说明 \(\bsX\) 的特征函数长成正态的样子. 这里有一个优雅的证明: 既然 \(\bs\alpha\) 和 \(\bs\omega\) 都是可以任取的, 那我似乎可以在符号上把两者的地位对调, 即:
\[
\begin{aligned}
\phi_\bsX(\bs\alpha)=&\ E(\exp(\rmj\bs\alpha^\T\bsX))
=\phi_\bs{y}(\omega)|_{\bs\omega=1}=\exp\lr({\rmj\mu_\bs{y}-\frac{1}{2}{\sigma_\bs{y}}^2})\\
\end{aligned}
\]
其中,
\[
\begin{aligned}
\mu_\bs{y}=&\ \bs\alpha^\T\bs\mu\ ,\\
{\sigma_{\bs{y}}}^2
=&\ E(\bs\alpha^\T\bsX-\bs\alpha^\T\bs\mu)(\bs\alpha^\T\bsX-\bs\alpha^\T\bs\mu)^\T\\
=&\ \bs\alpha^\T E(\bsX-\bs\mu)(\bsX-\bs\mu)^{\T}\bs\alpha\\
=&\ \bs\alpha^\T\bs\Sigma\bs\alpha
\end{aligned}
\]
代回 \(\phi_\bsX(\bs\alpha)\) 表达式中得到:
\[
\phi_\bsX(\bs\alpha)=\exp\lr({\rmj\bs\alpha^\T\bs\mu-\frac{1}{2}\bs\alpha^\T\bs\Sigma\bs\alpha})
\]
因此, \(\bsX\sim N(\bs\mu,\bs\Sigma)\).