analistica/slides/sections/4.md

6.3 KiB

Sample statistics

Sample statistics

How to estimate sample median, mode and FWHM?

. . .

  • \only<3>\strike{Binning data \hence depends wildly on bin-width}

. . .

  • Alternative solutions
    • Robust estimators
    • Kernel density estimation

Sample median

:::: {.columns align=bottom} ::: {.column width=50%}

$$
  F(m) = \frac{1}{2}

\vspace{20pt}

. . .

  • Sort points in ascending order

. . .

  • Middle element if odd

    Average of the two central elements if even :::

::: {.column width=50%} ::: ::::

\setbeamercovered{} \begin{center} \begin{tikzpicture}[remember picture, >=Stealth] % line \draw [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node [right] at (5,0) {$x$}; % points \draw [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1); \draw [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1); \draw [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1); \draw [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1); \draw [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1); \draw [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1); \draw [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1); \draw [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1); \draw [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1); \draw [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1); \pause % nodes \node [below] at (-4.7,-0.1) {1}; \node [below] at (-4.1,-0.1) {2}; \node [below] at (-3.4,-0.1) {3}; \node [below] at (-2.4,-0.1) {4}; \node [below] at (-0.7,-0.1) {5}; \node [below] at ( 0 ,-0.1) {6}; \node [below] at ( 1.2,-0.1) {7}; \node [below] at ( 2.1,-0.1) {8}; \node [below] at ( 2.8,-0.1) {9}; \node [below] at ( 4.1,-0.1) {10}; \pause \draw [ultra thick] (-0.35,0.7) -- (-0.35,-0.7); \end{tikzpicture} \end{center} \setbeamercovered{transparent}

Sample mode

Most probable value

. . .

Half Sample Mode

  • Iteratively identify the smallest interval containing half points
  • Once the sample is reduced to less than three points, take average

. . .

\setbeamercovered{}

\begin{center} \begin{tikzpicture}[remember picture] % line \draw [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node [right] at (5,0) {$x$}; % points \draw [blue!50!black, fill=blue] (-4.6,-0.1) rectangle (-4.8,0.1); \draw [blue!50!black, fill=blue] (-4,-0.1) rectangle (-4.2,0.1); \draw [blue!50!black, fill=blue] (-3.3,-0.1) rectangle (-3.5,0.1); \draw [blue!50!black, fill=blue] (-2.3,-0.1) rectangle (-2.5,0.1); \draw [blue!50!black, fill=blue] (-0.6,-0.1) rectangle (-0.8,0.1); \draw [blue!50!black, fill=blue] (-0.1,-0.1) rectangle (0.1,0.1); \draw [blue!50!black, fill=blue] (1.1,-0.1) rectangle (1.3,0.1); \draw [blue!50!black, fill=blue] (2,-0.1) rectangle (2.2,0.1); \draw [blue!50!black, fill=blue] (2.7,-0.1) rectangle (2.9,0.1); \draw [blue!50!black, fill=blue] (4,-0.1) rectangle (4.2,0.1); % future nodes \node at (-1,-0.3) (1a) {}; \node at (3.1,0.3) (1b) {}; \node at (0.9,-0.3) (2a) {}; \node at (1.8,-0.3) (3a) {}; % result nodes \node at (2.45,-0.7) (f1) {}; \node at (2.45,0.7) (f2) {}; \end{tikzpicture} \end{center}

. . .

\begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (1a) rectangle (1b); \end{tikzpicture} \end{center}

. . .

\begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (2a) rectangle (1b); \end{tikzpicture} \end{center}

. . .

\begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (3a) rectangle (1b); \end{tikzpicture} \end{center}

. . .

\begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [cyclamen, ultra thick] (f1) -- (f2); \end{tikzpicture} \end{center}

Sample FWHM


  \text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2}

\setbeamercovered{transparent} . . .

Kernel Density Estimation

:::: {.columns} ::: {.column width=50% .c}

  • empirical PDF construction:
$$
  f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N
  G \left( \frac{x-x_i}{\varepsilon} \right)

The parameter \varepsilon controls the strength of the smoothing :::

::: {.column width=50%} \setbeamercovered{} \begin{center} \begin{tikzpicture} % points \draw [blue!50!black, fill=blue] (-2,-0.1) rectangle (-1.8,0.1); \draw [blue!50!black, fill=blue] (-0.1,-0.1) rectangle (0.1,0.1); \draw [blue!50!black, fill=blue] (1.3,-0.1) rectangle (1.5,0.1); \draw [blue!50!black, fill=blue] (0.7,-0.1) rectangle (0.9,0.1); \pause % lines \draw [cyclamen, dashed] (-1.9,0.1) -- (-1.9,1); \draw [cyclamen, dashed] (0,0.1) -- (0,1); \draw [cyclamen, dashed] (1.4,0.1) -- (1.4,1); \draw [cyclamen, dashed] (0.8,0.1) -- (0.8,1); % Gaussians \draw[domain=-3.4:-0.4, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x + 1.9)(\x + 1.9)) + 0.1}); \draw[domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-\x\x + 0.1}); \draw[domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 1.4)(\x - 1.4)) + 0.1}); \draw[domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); \pause % sum \draw [fill=white, white, opacity=0.5] (-3.5,0.1) rectangle (3,1.3); \draw[domain=-3.4:3.4, smooth, variable=\x, blue, very thick] plot ({\x}, {exp(-(\x + 1.9)(\x + 1.9)) + exp(-\x\x) + exp(-(\x - 1.4)(\x - 1.4)) + exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); \end{tikzpicture} \end{center} \setbeamercovered{transparent} ::: ::::

Sample FWHM

Silverman's rule of thumb [@silver86]:


  \varepsilon = 0.88 \, S_N
  \left( \frac{d + 2}{4}N \right)^{-1/(d + 4)}

with:

  • S_N is the sample standard deviation
  • d is number of dimensions (d = 1)

. . .

Numerical minimization (Brent) for \quad f_{\varepsilon_{\text{max}}}
Numerical root finding (Brent) for $\quad f_{\varepsilon}(x_{\pm}) = \frac{f_{\varepsilon_{\text{max}}}}{2}$

Sample FWHM