2020-06-11 00:21:44 +02:00
|
|
|
# Kolmogorov-Smirnov test
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
## KS
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
Quantify distance between expected and observed CDF
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
. . .
|
|
|
|
|
2020-06-11 00:21:44 +02:00
|
|
|
:::: {.columns}
|
|
|
|
::: {.column width=50% .c}
|
|
|
|
KS statistic:
|
|
|
|
|
|
|
|
$$
|
|
|
|
D_N = \text{sup}_x |F_N(x) - F(x)|
|
|
|
|
$$
|
|
|
|
|
|
|
|
\vspace{20pt}
|
|
|
|
|
|
|
|
- $F(x)$ is the expected CDF
|
|
|
|
- $F_N(x)$ is the empirical CDF
|
|
|
|
- sort points in ascending order
|
|
|
|
- number of points preceding the point normalized by $N$
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
::: {.column width=50%}
|
|
|
|
\setbeamercovered{}
|
|
|
|
\begin{center}
|
|
|
|
\begin{tikzpicture}
|
|
|
|
% axes
|
|
|
|
\draw [thick, ->] (-2.5,0) -- (0,0) -- (0,4.5);
|
|
|
|
\draw [thick, ->] (0,0) -- (2.5,0);
|
|
|
|
% empiric
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] (-2.5,0) rectangle (-1.5,0.5);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] (-1.5,0) rectangle (-0.9,1);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] (-0.9,0) rectangle (-0.6,1.5);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] (-0.6,0) rectangle ( 0.2,2);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] ( 0.2,0) rectangle ( 0.5,2.5);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] ( 0.5,0) rectangle ( 0.8,3);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] ( 0.8,0) rectangle ( 1.6,3.5);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] ( 1.6,0) rectangle ( 2.3,4);
|
|
|
|
\draw [cyclamen, fill=cyclamen!20!white] ( 2.3,0) rectangle ( 2.5,4.5);
|
|
|
|
% points
|
|
|
|
\draw [blue!50!black, fill=blue] (-2.6,-0.1) rectangle (-2.4,0.1); %-2.5
|
|
|
|
\draw [blue!50!black, fill=blue] (-1.6,-0.1) rectangle (-1.4,0.1); %-1.5
|
|
|
|
\draw [blue!50!black, fill=blue] (-1,-0.1) rectangle (-0.8,0.1); %-0.9
|
|
|
|
\draw [blue!50!black, fill=blue] (-0.7,-0.1) rectangle (-0.5,0.1); %-0.6
|
|
|
|
\draw [blue!50!black, fill=blue] (0.1,-0.1) rectangle (0.3,0.1); % 0.2
|
|
|
|
\draw [blue!50!black, fill=blue] (0.4,-0.1) rectangle (0.6,0.1); % 0.5
|
|
|
|
\draw [blue!50!black, fill=blue] (0.7,-0.1) rectangle (0.9,0.1); % 0.8
|
|
|
|
\draw [blue!50!black, fill=blue] (1.5,-0.1) rectangle (1.7,0.1); % 1.6
|
|
|
|
\draw [blue!50!black, fill=blue] (2.2,-0.1) rectangle (2.4,0.1); % 2.3
|
|
|
|
% expected
|
|
|
|
\pause
|
|
|
|
\draw[domain=-2.5:2.5, yscale=5, smooth, variable=\x, blue, very thick]
|
|
|
|
plot ({\x}, {((atan(\x)*pi/180) + pi/2)/pi});
|
|
|
|
\pause
|
|
|
|
\draw [very thick, cyclamen] (0.8,3.6) -- (0.8,4.05);
|
|
|
|
\end{tikzpicture}
|
|
|
|
\end{center}
|
|
|
|
\setbeamercovered{transparent}
|
|
|
|
:::
|
|
|
|
::::
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
## KS
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
$H_0$: points sampled according to $F(x)$
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
. . .
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
If $H_0$ is true:
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
- $\sqrt{N}D_N \xrightarrow{N \rightarrow + \infty} K$
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
Kolmogorov distribution with CDF:
|
2020-06-07 19:59:07 +02:00
|
|
|
|
|
|
|
$$
|
2020-06-10 16:23:33 +02:00
|
|
|
P(K \leqslant K_0) = 1 - p = \frac{\sqrt{2 \pi}}{K_0}
|
|
|
|
\sum_{j = 1}^{+ \infty} e^{-(2j - 1)^2 \pi^2 / 8 K_0^2}
|
2020-06-07 19:59:07 +02:00
|
|
|
$$
|
|
|
|
|
|
|
|
. . .
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
a $p$-value can be computed
|
2020-06-07 19:59:07 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
- At 95% confidence level, $H_0$ cannot be disproved if $p > 0.05$
|