analistica/slides/sections/4.md

7.9 KiB

Sample statistics

Sample statistics

How to estimate sample median, mode and FWHM?

. . .

  • \only<3>\strike{Binning data \hence depends wildly on bin-width}

. . .

  • Alternative solutions
    • Robust estimators
    • Kernel density estimation

Sample median

\Begin{block}{Algorithm} ::: incremental

  1. Sample points
  2. Sort sample in ascending order
Take middle element if odd

Take average of two middle elements if even

::: \End{block}

\setbeamercovered{} \begin{center} \begin{tikzpicture}[remember picture, >=Stealth] % place holder \draw [ultra thick, white] (-0.35,0.7) -- (-0.35,-0.7); % line \draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node <1-> [right] at (5,0) {$x$}; % points \draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1); \draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1); \draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1); % nodes \node <2-> [below] at (-4.7,-0.1) {1}; \node <2-> [below] at (-4.1,-0.1) {2}; \node <2-> [below] at (-3.4,-0.1) {3}; \node <2-> [below] at (-2.4,-0.1) {4}; \node <2-> [below] at (-0.7,-0.1) {5}; \node <2-> [below] at ( 0 ,-0.1) {6}; \node <2-> [below] at ( 1.2,-0.1) {7}; \node <2-> [below] at ( 2.1,-0.1) {8}; \node <2-> [below] at ( 2.8,-0.1) {9}; \node <2-> [below] at ( 4.1,-0.1) {10}; \draw <3-> [ultra thick] (-0.35,0.7) -- (-0.35,-0.7); \end{tikzpicture} \end{center} \setbeamercovered{transparent}

Sample mode

Half Sample Mode [@robertson74]

\Begin{block}{Algorithm} ::: incremental

  1. Sample points
  2. Find the smallest interval containing half points
  3. Repeat on the new interval (iterative)
  4. If less than four points, take average of the closest two ::: \End{block}

\centering \setbeamercovered{} \begin{tikzpicture}[remember picture, >=Stealth] % line \draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node <1-> [right] at (5,0) {$x$}; % points \draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1); \draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1); \draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1); % nodes \node <1-> at (-1,-0.3) (1a) {}; \node <1-> at (3.1,0.3) (1b) {}; \node <1-> at (0.9,-0.3) (2a) {}; \node <1-> at (1.8,-0.3) (3a) {}; \node <1-> at (2.45,-0.7) (f1) {}; \node <1-> at (2.45,0.7) (f2) {}; % algorithm \draw <2-> [gray, fill=gray, opacity=0.5] (1a) rectangle (1b); \draw <3-> [gray, fill=gray, opacity=0.6] (2a) rectangle (1b); \draw <4-> [cyclamen, thick] (3a) rectangle (1b); \draw <5-> [ultra thick] (f1) -- (f2); \end{tikzpicture}

Sample FWHM


  \text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2}

\setbeamercovered{transparent} . . .

Kernel Density Estimation

:::: {.columns} ::: {.column width=50% .c}

  • empirical PDF construction:
$$
  f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N
  G \left( \frac{x-x_i}{\varepsilon} \right)
  • The parameter \varepsilon controls the
    sharpness of the empirical PDF :::

::: {.column width=50%} \setbeamercovered{} \begin{center} \begin{tikzpicture} % placeholder \draw [white] (-2.7,-0.2) rectangle (3,3.3); % bandwidth 1 \node <4,5> [left] at (2.9,3) {$\epsilon = 1$}; % points \draw <3-> [yellow!50!black, fill=yellow] (-1.2,-0.2) rectangle (-1,0); \draw <3-> [yellow!50!black, fill=yellow] (-0.1,-0.2) rectangle (0.1,0); \draw <3-> [yellow!50!black, fill=yellow] (0.7,-0.2) rectangle (0.9,0); \draw <3-> [yellow!50!black, fill=yellow] (1.3,-0.2) rectangle (1.5,0); % lines 1 \draw <4,5> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,1); \draw <4,5> [cyclamen, dashed] (0,0.1) -- (0,1); \draw <4,5> [cyclamen, dashed] (1.4,0.1) -- (1.4,1); \draw <4,5> [cyclamen, dashed] (0.8,0.1) -- (0.8,1); % Gaussians 1 \draw <4,5> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)) + 0.1}); \draw <4,5> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-\x\x + 0.1}); \draw <4,5> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); \draw <4,5> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 1.4)(\x - 1.4)) + 0.1}); % sum 1 \draw <5> [fill=white, white, opacity=0.5] (-2.7,0.1) rectangle (3,2.7); \draw <5> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)) + exp(-\x\x) + exp(-(\x - 1.4)(\x - 1.4)) + exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); % bandwidth 2 \node <6> [left] at (2.9,3) {$\epsilon = 0.5$}; % lines 2 \draw <6> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,2); \draw <6> [cyclamen, dashed] (0,0.1) -- (0,2); \draw <6> [cyclamen, dashed] (1.4,0.1) -- (1.4,2); \draw <6> [cyclamen, dashed] (0.8,0.1) -- (0.8,2); % Gaussians 2 \draw <6> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)/0.25)/0.5 + 0.1}); \draw <6> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-\x\x/0.25)/0.5 + 0.1}); \draw <6> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 0.8)(\x - 0.8)/0.25)/0.5 + 0.1}); \draw <6> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 1.4)(\x - 1.4)/0.25)/0.5 + 0.1}); % sum \draw <6> [fill=white, white, opacity=0.5] (-2.7,0.05) rectangle (3,2.7); \draw <6> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)/0.25)/0.5 + exp(-\x\x/0.25)/0.5 + exp(-(\x - 1.4)(\x - 1.4)/0.25)/0.5 + exp(-(\x - 0.8)(\x - 0.8)/0.25)/0.5 + 0.1}); \end{tikzpicture} \end{center} \setbeamercovered{transparent} ::: ::::

Sample FWHM

Silverman's rule of thumb [@silver86]:


  \varepsilon = 0.88 \, S_N
  \left( \frac{d + 2}{4}N \right)^{-1/(d + 4)}

where:

  • S_N is the sample standard deviation
  • d is number of dimensions (d = 1)

. . .

Minimization (Brent) for \quad f_{\varepsilon_{\text{max}}}
Root finding (Brent-Dekker) for $\quad f_{\varepsilon}(x_{\pm}) = \frac{f_{\varepsilon_{\text{max}}}}{2}$

Sample FWHM

Bootstrap

Estimating confidence interval:

. . .

\Begin{block}{Algorithm} ::: incremental

  1. Sample N points from PDF
  2. Sample with replacement M times
  3. Apply the test to each new sample
  4. Compute mean and standard deviation ::: \End{block}