analistica/slides/sections/5.md

2.6 KiB

Kolmogorov-Smirnov test

KS

Quantify distance between expected and observed CDF

. . .

:::: {.columns} ::: {.column width=50% .c} KS statistic:

$$
  D_N = \text{sup}_x |F_N(x) - F(x)|

\vspace{20pt}

  • F(x) is the expected CDF
  • F_N(x) is the empirical CDF
    • sort points in ascending order
    • number of points preceding the point normalized by N

:::

::: {.column width=50%} \setbeamercovered{} \begin{center} \begin{tikzpicture} % axes \draw [thick, ->] (-2.5,0) -- (0,0) -- (0,4.5); \draw [thick, ->] (0,0) -- (2.5,0); % empiric \draw [cyclamen, fill=cyclamen!20!white] (-2.5,0) rectangle (-1.5,0.5); \draw [cyclamen, fill=cyclamen!20!white] (-1.5,0) rectangle (-0.9,1); \draw [cyclamen, fill=cyclamen!20!white] (-0.9,0) rectangle (-0.6,1.5); \draw [cyclamen, fill=cyclamen!20!white] (-0.6,0) rectangle ( 0.2,2); \draw [cyclamen, fill=cyclamen!20!white] ( 0.2,0) rectangle ( 0.5,2.5); \draw [cyclamen, fill=cyclamen!20!white] ( 0.5,0) rectangle ( 0.8,3); \draw [cyclamen, fill=cyclamen!20!white] ( 0.8,0) rectangle ( 1.6,3.5); \draw [cyclamen, fill=cyclamen!20!white] ( 1.6,0) rectangle ( 2.3,4); \draw [cyclamen, fill=cyclamen!20!white] ( 2.3,0) rectangle ( 2.5,4.5); % points \draw [blue!50!black, fill=blue] (-2.6,-0.1) rectangle (-2.4,0.1); %-2.5 \draw [blue!50!black, fill=blue] (-1.6,-0.1) rectangle (-1.4,0.1); %-1.5 \draw [blue!50!black, fill=blue] (-1,-0.1) rectangle (-0.8,0.1); %-0.9 \draw [blue!50!black, fill=blue] (-0.7,-0.1) rectangle (-0.5,0.1); %-0.6 \draw [blue!50!black, fill=blue] (0.1,-0.1) rectangle (0.3,0.1); % 0.2 \draw [blue!50!black, fill=blue] (0.4,-0.1) rectangle (0.6,0.1); % 0.5 \draw [blue!50!black, fill=blue] (0.7,-0.1) rectangle (0.9,0.1); % 0.8 \draw [blue!50!black, fill=blue] (1.5,-0.1) rectangle (1.7,0.1); % 1.6 \draw [blue!50!black, fill=blue] (2.2,-0.1) rectangle (2.4,0.1); % 2.3 % expected \pause \draw[domain=-2.5:2.5, yscale=5, smooth, variable=\x, blue, very thick] plot ({\x}, {((atan(\x)*pi/180) + pi/2)/pi}); \pause \draw [very thick, cyclamen] (0.8,3.6) -- (0.8,4.05); \end{tikzpicture} \end{center} \setbeamercovered{transparent} ::: ::::

KS

H_0: points sampled according to F(x)

. . .

If H_0 is true:

  • \sqrt{N}D_N \xrightarrow{N \rightarrow + \infty} K

Kolmogorov distribution with CDF:


  P(K \leqslant K_0) = 1 - p = \frac{\sqrt{2 \pi}}{K_0}
  \sum_{j = 1}^{+ \infty} e^{-(2j - 1)^2 \pi^2 / 8 K_0^2}

. . .

a $p$-value can be computed

  • At 95% confidence level, H_0 cannot be disproved if p > 0.05