2020-06-10 16:23:33 +02:00
|
|
|
# Kolmogorov - Smirnov test
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
## KS
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
Quantify distance between expected and observed CDF
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
. . .
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
KS statistic:
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
$$
|
2020-06-10 16:23:33 +02:00
|
|
|
D_N = \text{sup}_x |F_N(x) - F(x)|
|
2020-06-07 14:32:03 +02:00
|
|
|
$$
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
- $F(x)$ is the expected CDF
|
|
|
|
- $F_N(x)$ is the empirical CDF of $N$ sampled points
|
|
|
|
- sort points in ascending order
|
|
|
|
- number of points preceding the point normalized by $N$
|
2020-06-07 14:32:03 +02:00
|
|
|
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
## KS
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
$H_0$: points sampled according to $F(x)$
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
. . .
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
If $H_0$ is true:
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
- $\sqrt{N}D_N \xrightarrow{N \rightarrow + \infty} K$
|
2020-06-07 14:32:03 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
Kolmogorov distribution with CDF:
|
2020-06-07 19:59:07 +02:00
|
|
|
|
|
|
|
$$
|
2020-06-10 16:23:33 +02:00
|
|
|
P(K \leqslant K_0) = 1 - p = \frac{\sqrt{2 \pi}}{K_0}
|
|
|
|
\sum_{j = 1}^{+ \infty} e^{-(2j - 1)^2 \pi^2 / 8 K_0^2}
|
2020-06-07 19:59:07 +02:00
|
|
|
$$
|
|
|
|
|
|
|
|
. . .
|
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
a $p$-value can be computed
|
2020-06-07 19:59:07 +02:00
|
|
|
|
2020-06-10 16:23:33 +02:00
|
|
|
- At 95% confidence level, $H_0$ cannot be disproved if $p > 0.05$
|