analistica/slides/sections/5.md

44 lines
726 B
Markdown
Raw Normal View History

2020-06-10 16:23:33 +02:00
# Kolmogorov - Smirnov test
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
## KS
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
Quantify distance between expected and observed CDF
2020-06-07 14:32:03 +02:00
. . .
2020-06-10 16:23:33 +02:00
KS statistic:
2020-06-07 14:32:03 +02:00
$$
2020-06-10 16:23:33 +02:00
D_N = \text{sup}_x |F_N(x) - F(x)|
2020-06-07 14:32:03 +02:00
$$
2020-06-10 16:23:33 +02:00
- $F(x)$ is the expected CDF
- $F_N(x)$ is the empirical CDF of $N$ sampled points
- sort points in ascending order
- number of points preceding the point normalized by $N$
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
## KS
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
$H_0$: points sampled according to $F(x)$
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
. . .
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
If $H_0$ is true:
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
- $\sqrt{N}D_N \xrightarrow{N \rightarrow + \infty} K$
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
Kolmogorov distribution with CDF:
2020-06-07 19:59:07 +02:00
$$
2020-06-10 16:23:33 +02:00
P(K \leqslant K_0) = 1 - p = \frac{\sqrt{2 \pi}}{K_0}
\sum_{j = 1}^{+ \infty} e^{-(2j - 1)^2 \pi^2 / 8 K_0^2}
2020-06-07 19:59:07 +02:00
$$
. . .
2020-06-10 16:23:33 +02:00
a $p$-value can be computed
2020-06-07 19:59:07 +02:00
2020-06-10 16:23:33 +02:00
- At 95% confidence level, $H_0$ cannot be disproved if $p > 0.05$