7.9 KiB
Sample statistics
Sample statistics
How to estimate sample median, mode and FWHM?
. . .
- \only<3>\strike{Binning data
\hence
depends wildly on bin-width}
. . .
- Alternative solutions
- Robust estimators
- Kernel density estimation
Sample median
\Begin{block}{Algorithm} ::: incremental
- Sample points
- Sort sample in ascending order
Take middle element if odd
Take average of two middle elements if even
::: \End{block}
\setbeamercovered{} \begin{center} \begin{tikzpicture}[remember picture, >=Stealth] % place holder \draw [ultra thick, white] (-0.35,0.7) -- (-0.35,-0.7); % line \draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node <1-> [right] at (5,0) {$x$}; % points \draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1); \draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1); \draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1); % nodes \node <2-> [below] at (-4.7,-0.1) {1}; \node <2-> [below] at (-4.1,-0.1) {2}; \node <2-> [below] at (-3.4,-0.1) {3}; \node <2-> [below] at (-2.4,-0.1) {4}; \node <2-> [below] at (-0.7,-0.1) {5}; \node <2-> [below] at ( 0 ,-0.1) {6}; \node <2-> [below] at ( 1.2,-0.1) {7}; \node <2-> [below] at ( 2.1,-0.1) {8}; \node <2-> [below] at ( 2.8,-0.1) {9}; \node <2-> [below] at ( 4.1,-0.1) {10}; \draw <3-> [ultra thick] (-0.35,0.7) -- (-0.35,-0.7); \end{tikzpicture} \end{center} \setbeamercovered{transparent}
Sample mode
Half Sample Mode [@robertson74]
\Begin{block}{Algorithm} ::: incremental
- Sample points
- Find the smallest interval containing half points
- Repeat on the new interval (iterative)
- If less than four points, take average of the closest two ::: \End{block}
\centering \setbeamercovered{} \begin{tikzpicture}[remember picture, >=Stealth] % line \draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node <1-> [right] at (5,0) {$x$}; % points \draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1); \draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1); \draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1); \draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1); \draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1); % nodes \node <1-> at (-1,-0.3) (1a) {}; \node <1-> at (3.1,0.3) (1b) {}; \node <1-> at (0.9,-0.3) (2a) {}; \node <1-> at (1.8,-0.3) (3a) {}; \node <1-> at (2.45,-0.7) (f1) {}; \node <1-> at (2.45,0.7) (f2) {}; % algorithm \draw <2-> [gray, fill=gray, opacity=0.5] (1a) rectangle (1b); \draw <3-> [gray, fill=gray, opacity=0.6] (2a) rectangle (1b); \draw <4-> [cyclamen, thick] (3a) rectangle (1b); \draw <5-> [ultra thick] (f1) -- (f2); \end{tikzpicture}
Sample FWHM
\text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2}
\setbeamercovered{transparent} . . .
Kernel Density Estimation
:::: {.columns} ::: {.column width=50% .c}
- empirical PDF construction:
$$
f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N
G \left( \frac{x-x_i}{\varepsilon} \right)
- The parameter
\varepsilon
controls the
sharpness of the empirical PDF :::
::: {.column width=50%} \setbeamercovered{} \begin{center} \begin{tikzpicture} % placeholder \draw [white] (-2.7,-0.2) rectangle (3,3.3); % bandwidth 1 \node <4,5> [left] at (2.9,3) {$\epsilon = 1$}; % points \draw <3-> [yellow!50!black, fill=yellow] (-1.2,-0.2) rectangle (-1,0); \draw <3-> [yellow!50!black, fill=yellow] (-0.1,-0.2) rectangle (0.1,0); \draw <3-> [yellow!50!black, fill=yellow] (0.7,-0.2) rectangle (0.9,0); \draw <3-> [yellow!50!black, fill=yellow] (1.3,-0.2) rectangle (1.5,0); % lines 1 \draw <4,5> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,1); \draw <4,5> [cyclamen, dashed] (0,0.1) -- (0,1); \draw <4,5> [cyclamen, dashed] (1.4,0.1) -- (1.4,1); \draw <4,5> [cyclamen, dashed] (0.8,0.1) -- (0.8,1); % Gaussians 1 \draw <4,5> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)) + 0.1}); \draw <4,5> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-\x\x + 0.1}); \draw <4,5> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); \draw <4,5> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 1.4)(\x - 1.4)) + 0.1}); % sum 1 \draw <5> [fill=white, white, opacity=0.5] (-2.7,0.1) rectangle (3,2.7); \draw <5> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)) + exp(-\x\x) + exp(-(\x - 1.4)(\x - 1.4)) + exp(-(\x - 0.8)(\x - 0.8)) + 0.1}); % bandwidth 2 \node <6> [left] at (2.9,3) {$\epsilon = 0.5$}; % lines 2 \draw <6> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,2); \draw <6> [cyclamen, dashed] (0,0.1) -- (0,2); \draw <6> [cyclamen, dashed] (1.4,0.1) -- (1.4,2); \draw <6> [cyclamen, dashed] (0.8,0.1) -- (0.8,2); % Gaussians 2 \draw <6> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)/0.25)/0.5 + 0.1}); \draw <6> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-\x\x/0.25)/0.5 + 0.1}); \draw <6> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 0.8)(\x - 0.8)/0.25)/0.5 + 0.1}); \draw <6> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick] plot ({\x}, {exp(-(\x - 1.4)(\x - 1.4)/0.25)/0.5 + 0.1}); % sum \draw <6> [fill=white, white, opacity=0.5] (-2.7,0.05) rectangle (3,2.7); \draw <6> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick] plot ({\x}, {exp(-(\x + 1.1)(\x + 1.1)/0.25)/0.5 + exp(-\x\x/0.25)/0.5 + exp(-(\x - 1.4)(\x - 1.4)/0.25)/0.5 + exp(-(\x - 0.8)(\x - 0.8)/0.25)/0.5 + 0.1}); \end{tikzpicture} \end{center} \setbeamercovered{transparent} ::: ::::
Sample FWHM
Silverman's rule of thumb [@silver86]:
\varepsilon = 0.88 \, S_N
\left( \frac{d + 2}{4}N \right)^{-1/(d + 4)}
where:
S_N
is the sample standard deviationd
is number of dimensions (d = 1
)
. . .
Minimization (Brent) for \quad f_{\varepsilon_{\text{max}}}
Root finding (Brent-Dekker) for $\quad f_{\varepsilon}(x_{\pm}) =
\frac{f_{\varepsilon_{\text{max}}}}{2}$
Sample FWHM
Bootstrap
Estimating confidence interval:
. . .
\Begin{block}{Algorithm} ::: incremental
- Sample
N
points from PDF - Sample with replacement
M
times - Apply the test to each new sample
- Compute mean and standard deviation ::: \End{block}