238 lines
7.8 KiB
Markdown
238 lines
7.8 KiB
Markdown
# Sample statistics
|
|
|
|
|
|
## Sample statistics
|
|
|
|
How to estimate sample median, mode and FWHM?
|
|
|
|
. . .
|
|
|
|
- \only<3>\strike{Binning data $\hence$ depends wildly on bin-width}
|
|
|
|
. . .
|
|
|
|
- Alternative solutions
|
|
- Robust estimators
|
|
- Kernel density estimation
|
|
|
|
|
|
## Sample median
|
|
|
|
\Begin{block}{Algorithm}
|
|
::: incremental
|
|
1. Sample points
|
|
2. Sort sample in ascending order
|
|
3.
|
|
Take middle element if odd
|
|
|
|
Take average of two middle elements if even
|
|
:::
|
|
\End{block}
|
|
|
|
\setbeamercovered{}
|
|
\begin{center}
|
|
\begin{tikzpicture}[remember picture, >=Stealth]
|
|
% placeholder
|
|
\draw [ultra thick, transparent] (-0.35,0.7) -- (-0.35,-0.7);
|
|
% line
|
|
\draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0);
|
|
\node <1-> [right] at (5,0) {$x$};
|
|
% points
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1);
|
|
% nodes
|
|
\node <2-> [below] at (-4.7,-0.1) {1};
|
|
\node <2-> [below] at (-4.1,-0.1) {2};
|
|
\node <2-> [below] at (-3.4,-0.1) {3};
|
|
\node <2-> [below] at (-2.4,-0.1) {4};
|
|
\node <2-> [below] at (-0.7,-0.1) {5};
|
|
\node <2-> [below] at ( 0 ,-0.1) {6};
|
|
\node <2-> [below] at ( 1.2,-0.1) {7};
|
|
\node <2-> [below] at ( 2.1,-0.1) {8};
|
|
\node <2-> [below] at ( 2.8,-0.1) {9};
|
|
\node <2-> [below] at ( 4.1,-0.1) {10};
|
|
\draw <3-> [ultra thick] (-0.35,0.7) -- (-0.35,-0.7);
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
\setbeamercovered{transparent}
|
|
|
|
|
|
## Sample mode
|
|
|
|
**Half Sample Mode** [@robertson74]
|
|
|
|
\Begin{block}{Algorithm}
|
|
::: incremental
|
|
1. Sample points
|
|
2. Find the smallest interval containing half points
|
|
3. Repeat on the new interval (iterative)
|
|
4. If less than four points, take average of the closest two
|
|
:::
|
|
\End{block}
|
|
|
|
\centering
|
|
\setbeamercovered{}
|
|
\begin{tikzpicture}[remember picture, >=Stealth]
|
|
% line
|
|
\draw <1-> [line width=3, ->, cyclamen] (-5,0) -- (5,0);
|
|
\node <1-> [right] at (5,0) {$x$};
|
|
% points
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-4.6,-0.1) rectangle (-4.8,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-4,-0.1) rectangle (-4.2,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-3.3,-0.1) rectangle (-3.5,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-2.3,-0.1) rectangle (-2.5,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-0.6,-0.1) rectangle (-0.8,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (-0.1,-0.1) rectangle (0.1,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (1.1,-0.1) rectangle (1.3,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (2,-0.1) rectangle (2.2,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (2.7,-0.1) rectangle (2.9,0.1);
|
|
\draw <1-> [yellow!50!black, fill=yellow] (4,-0.1) rectangle (4.2,0.1);
|
|
% nodes
|
|
\node <1-> at (-1,-0.3) (1a) {};
|
|
\node <1-> at (3.1,0.3) (1b) {};
|
|
\node <1-> at (0.9,-0.3) (2a) {};
|
|
\node <1-> at (1.8,-0.3) (3a) {};
|
|
\node <1-> at (2.45,-0.7) (f1) {};
|
|
\node <1-> at (2.45,0.7) (f2) {};
|
|
% algorithm
|
|
\draw <2-> [gray, fill=gray, opacity=0.5] (1a) rectangle (1b);
|
|
\draw <3-> [gray, fill=gray, opacity=0.6] (2a) rectangle (1b);
|
|
\draw <4-> [cyclamen, thick] (3a) rectangle (1b);
|
|
\draw <5-> [ultra thick] (f1) -- (f2);
|
|
\end{tikzpicture}
|
|
|
|
|
|
## Sample FWHM
|
|
|
|
$$
|
|
\text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2}
|
|
$$
|
|
|
|
\setbeamercovered{transparent}
|
|
. . .
|
|
|
|
**Kernel Density Estimation**
|
|
|
|
:::: {.columns}
|
|
::: {.column width=50% .c}
|
|
- empirical PDF construction:
|
|
|
|
$$
|
|
f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N
|
|
G \left( \frac{x-x_i}{\varepsilon} \right)
|
|
$$
|
|
|
|
- The parameter $\varepsilon$ controls the
|
|
sharpness of the empirical PDF
|
|
:::
|
|
|
|
::: {.column width=50%}
|
|
\setbeamercovered{}
|
|
\begin{center}
|
|
\begin{tikzpicture}
|
|
% placeholder
|
|
\draw [transparent] (-2.7,-0.2) rectangle (3,3.3);
|
|
% bandwidth 1
|
|
\node <4,5> [left] at (2.9,3) {$\varepsilon = 1$};
|
|
% points
|
|
\draw <3-> [yellow!50!black, fill=yellow] (-1.2,-0.2) rectangle (-1,0);
|
|
\draw <3-> [yellow!50!black, fill=yellow] (-0.1,-0.2) rectangle (0.1,0);
|
|
\draw <3-> [yellow!50!black, fill=yellow] (0.7,-0.2) rectangle (0.9,0);
|
|
\draw <3-> [yellow!50!black, fill=yellow] (1.3,-0.2) rectangle (1.5,0);
|
|
% lines 1
|
|
\draw <4,5> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,1);
|
|
\draw <4,5> [cyclamen, dashed] (0,0.1) -- (0,1);
|
|
\draw <4,5> [cyclamen, dashed] (1.4,0.1) -- (1.4,1);
|
|
\draw <4,5> [cyclamen, dashed] (0.8,0.1) -- (0.8,1);
|
|
% Gaussians 1
|
|
\draw <4,5> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x + 1.1)*(\x + 1.1)) + 0.1});
|
|
\draw <4,5> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-\x*\x + 0.1});
|
|
\draw <4,5> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x - 0.8)*(\x - 0.8)) + 0.1});
|
|
\draw <4,5> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x - 1.4)*(\x - 1.4)) + 0.1});
|
|
% sum 1
|
|
\draw <5> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick]
|
|
plot ({\x}, {exp(-(\x + 1.1)*(\x + 1.1)) +
|
|
exp(-\x*\x) +
|
|
exp(-(\x - 1.4)*(\x - 1.4)) +
|
|
exp(-(\x - 0.8)*(\x - 0.8)) + 0.1});
|
|
% bandwidth 2
|
|
\node <6> [left] at (2.9,3) {$\epsilon = 0.5$};
|
|
% lines 2
|
|
\draw <6> [cyclamen, dashed] (-1.1,0.1) -- (-1.1,2);
|
|
\draw <6> [cyclamen, dashed] (0,0.1) -- (0,2);
|
|
\draw <6> [cyclamen, dashed] (1.4,0.1) -- (1.4,2);
|
|
\draw <6> [cyclamen, dashed] (0.8,0.1) -- (0.8,2);
|
|
% Gaussians 2
|
|
\draw <6> [domain=-2.6:0.4, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x + 1.1)*(\x + 1.1)/0.25)/0.5 + 0.1});
|
|
\draw <6> [domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-\x*\x/0.25)/0.5 + 0.1});
|
|
\draw <6> [domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x - 0.8)*(\x - 0.8)/0.25)/0.5 + 0.1});
|
|
\draw <6> [domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick]
|
|
plot ({\x}, {exp(-(\x - 1.4)*(\x - 1.4)/0.25)/0.5 + 0.1});
|
|
% sum
|
|
\draw <6> [domain=-2.6:2.9, smooth, variable=\x, yellow, very thick]
|
|
plot ({\x}, {exp(-(\x + 1.1)*(\x + 1.1)/0.25)/0.5 +
|
|
exp(-\x*\x/0.25)/0.5 +
|
|
exp(-(\x - 1.4)*(\x - 1.4)/0.25)/0.5 +
|
|
exp(-(\x - 0.8)*(\x - 0.8)/0.25)/0.5 + 0.1});
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
\setbeamercovered{transparent}
|
|
:::
|
|
::::
|
|
|
|
|
|
## Sample FWHM
|
|
|
|
**Silverman's rule of thumb** [@silver86]:
|
|
|
|
$$
|
|
\varepsilon = 0.88 \, S_N
|
|
\left( \frac{d + 2}{4}N \right)^{-1/(d + 4)}
|
|
$$
|
|
where:
|
|
|
|
- $S_N$ is the sample standard deviation
|
|
- $d$ is number of dimensions ($d = 1$)
|
|
|
|
. . .
|
|
|
|
Minimization (Brent) for $\quad f_{\varepsilon_{\text{max}}}$
|
|
Root finding (Brent-Dekker) for $\quad f_{\varepsilon}(x_{\pm}) =
|
|
\frac{f_{\varepsilon_{\text{max}}}}{2}$
|
|
|
|
|
|
## Sample FWHM
|
|
|
|
![](images/kde.pdf)
|
|
|
|
|
|
## Bootstrap
|
|
|
|
Estimating confidence interval:
|
|
|
|
. . .
|
|
|
|
\Begin{block}{Algorithm}
|
|
::: incremental
|
|
1. Sample $N$ points from PDF
|
|
2. Sample with replacement $M$ times
|
|
3. Compute the estimator for each new sample
|
|
4. Compute mean and standard deviation
|
|
:::
|
|
\End{block}
|