analistica/slides/sections/5.md

# Kolmogorov-Smirnov test


## KS

Quantify distance between expected and observed CDF. KS statistic:

:::: {.columns}
:::  {.column width=50% .c}
  $$
    D_N = \text{sup}_x |F_N(x) - F(x)|
  $$

  \vspace{20pt}

  - $F(x)$ is the expected CDF
  - $F_N(x)$ is the empirical CDF
    - sort points in ascending order
    - number of points preceding the point normalized by $N$

  . . .

:::

:::  {.column width=50%}
  \setbeamercovered{}
  \begin{center}
  \begin{tikzpicture}[>=Stealth]
    % empiric
    \draw [cyclamen, thick, fill=cyclamen!20!white]
          (-2.5,0)   -- (-2.5,0.5) -- (-1.5,0.5) -- (-1.5,1)  -- (-0.9,1)  --
          (-0.9,1.5) -- (-0.1,1.5) -- (-0.1,2)   -- (1,2)     -- (1,2.5)   --
          (1.2,2.5)  -- (1.2,3)    -- (1.3,3)    -- (1.3,3.5) -- (1.6,3.5) --
          (1.6,4)    -- (2.3,4)    -- (2.3,4.5)  -- (2.5,4.5) -- (2.5,0)   --
          cycle;
    % points
    \draw [yellow!50!black, fill=yellow] (-2.6,-0.1) rectangle (-2.4,0.1); %-2.5
    \draw [yellow!50!black, fill=yellow] (-1.6,-0.1) rectangle (-1.4,0.1); %-1.5
    \draw [yellow!50!black, fill=yellow] (-1,-0.1)   rectangle (-0.8,0.1); %-0.9
    \draw [yellow!50!black, fill=yellow] (-0.2,-0.1) rectangle (0,0.1);    %-0.1
    \draw [yellow!50!black, fill=yellow] (0.9,-0.1)  rectangle (1.1,0.1);  % 1
    \draw [yellow!50!black, fill=yellow] (1.1,-0.1)  rectangle (1.3,0.1);  % 1.2
    \draw [yellow!50!black, fill=yellow] (1.2,-0.1)  rectangle (1.4,0.1);  % 1.3
    \draw [yellow!50!black, fill=yellow] (1.5,-0.1)  rectangle (1.7,0.1);  % 1.6
    \draw [yellow!50!black, fill=yellow] (2.2,-0.1)  rectangle (2.4,0.1);  % 2.3
    % expected
    \pause
    \draw[domain=-2.5:2.5, yscale=5, smooth, variable=\x, yellow, very thick]
          plot ({\x}, {((atan(\x)*pi/180) + pi/2)/pi});
    \pause
    \draw [very thick, cyclamen, <->] (1,2.5) -- (1,3.75);
  \end{tikzpicture}
  \end{center}
  \setbeamercovered{transparent}
:::
::::


## KS

$H_0$: points sampled according to $F(x)$

. . .

If $H_0$ is true: $\sqrt{N}D_N \xrightarrow{N \rightarrow + \infty} K$

$K$ Kolmogorov variable with CDF:

$$
  P(K \leqslant K_0) = \frac{\sqrt{2 \pi}}{K_0}
  \sum_{j = 1}^{+ \infty} e^{-(2j - 1)^2 \pi^2 / 8 K_0^2}
$$

. . .

A $p$-value can be computed

- At 95% confidence level, $H_0$ cannot be disproved if $p > 0.05$