analistica/slides/sections/4.md

211 lines
4.9 KiB
Markdown
Raw Normal View History

2020-06-10 16:23:33 +02:00
# Sample statistics
2020-06-10 16:23:33 +02:00
## Sample statistics
2020-06-07 00:02:20 +02:00
2020-06-10 16:23:33 +02:00
How to estimate sample median, mode and FWHM?
2020-06-07 14:32:03 +02:00
. . .
2020-06-10 16:23:33 +02:00
- \only<3>\strike{Binning data $\hence$ depends wildly on bin-width}
2020-06-07 14:32:03 +02:00
. . .
- Alternative solutions
2020-06-10 16:23:33 +02:00
- Robust estimators
- Kernel density estimation
2020-06-07 14:32:03 +02:00
## Sample median
2020-06-10 16:23:33 +02:00
:::: {.columns}
::: {.column width=50% .c}
$$
F(m) = \frac{1}{2}
$$
2020-06-10 16:23:33 +02:00
\vspace{20pt}
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
. . .
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
- Sort points in ascending order
. . .
- Middle element if odd
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
Average of the two central elements if even
:::
::: {.column width=50%}
![](images/median.pdf)
:::
::::
2020-06-07 14:32:03 +02:00
## Sample mode
Most probable value
. . .
2020-06-10 16:23:33 +02:00
Half Sample Mode
2020-06-07 14:32:03 +02:00
- Iteratively identify the smallest interval containing half points
- Once the sample is reduced to less than three points, take average
2020-06-07 14:32:03 +02:00
2020-06-10 16:23:33 +02:00
. . .
\setbeamercovered{}
\begin{center}
\begin{tikzpicture}[remember picture]
% line
\draw [line width=3, ->, cyclamen] (-5,0) -- (5,0);
\node [right] at (5,0) {$x$};
% points
2020-06-11 00:21:44 +02:00
\draw [blue!50!black, fill=blue] (-4.6,-0.1) rectangle (-4.8,0.1);
\draw [blue!50!black, fill=blue] (-4,-0.1) rectangle (-4.2,0.1);
\draw [blue!50!black, fill=blue] (-3.3,-0.1) rectangle (-3.5,0.1);
\draw [blue!50!black, fill=blue] (-2.3,-0.1) rectangle (-2.5,0.1);
\draw [blue!50!black, fill=blue] (-0.6,-0.1) rectangle (-0.8,0.1);
\draw [blue!50!black, fill=blue] (-0.1,-0.1) rectangle (0.1,0.1);
\draw [blue!50!black, fill=blue] (1.1,-0.1) rectangle (1.3,0.1);
\draw [blue!50!black, fill=blue] (2 ,-0.1) rectangle (2.2,0.1);
\draw [blue!50!black, fill=blue] (2.7,-0.1) rectangle (2.9,0.1);
\draw [blue!50!black, fill=blue] (4,-0.1) rectangle (4.2,0.1);
2020-06-10 16:23:33 +02:00
% future nodes
\node at (-1,-0.3) (1a) {};
\node at (3.1,0.3) (1b) {};
\node at (0.9,-0.3) (2a) {};
\node at (1.8,-0.3) (3a) {};
% result nodes
\node at (2.45,-0.7) (f1) {};
\node at (2.45,0.7) (f2) {};
\end{tikzpicture}
\end{center}
. . .
\begin{center}
\begin{tikzpicture}[remember picture, overlay]
% region
\draw [orange, fill=orange, opacity=0.5] (1a) rectangle (1b);
\end{tikzpicture}
\end{center}
. . .
\begin{center}
\begin{tikzpicture}[remember picture, overlay]
% region
\draw [orange, fill=orange, opacity=0.5] (2a) rectangle (1b);
\end{tikzpicture}
\end{center}
. . .
\begin{center}
\begin{tikzpicture}[remember picture, overlay]
% region
\draw [orange, fill=orange, opacity=0.5] (3a) rectangle (1b);
\end{tikzpicture}
\end{center}
. . .
\begin{center}
\begin{tikzpicture}[remember picture, overlay]
% region
\draw [cyclamen, ultra thick] (f1) -- (f2);
\end{tikzpicture}
\end{center}
2020-06-07 14:32:03 +02:00
## Sample FWHM
$$
2020-06-07 14:32:03 +02:00
\text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2}
$$
2020-06-10 16:23:33 +02:00
\setbeamercovered{transparent}
. . .
2020-06-11 00:21:44 +02:00
**Kernel Density Estimation**
2020-06-07 14:32:03 +02:00
2020-06-10 18:48:17 +02:00
:::: {.columns}
::: {.column width=50% .c}
- empirical PDF construction:
2020-06-07 14:32:03 +02:00
2020-06-10 18:48:17 +02:00
$$
f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N
G \left( \frac{x-x_i}{\varepsilon} \right)
$$
2020-06-07 00:02:20 +02:00
2020-06-10 18:48:17 +02:00
The parameter $\varepsilon$ controls the strength of the smoothing
:::
::: {.column width=50%}
\setbeamercovered{}
\begin{center}
\begin{tikzpicture}
% points
2020-06-11 00:21:44 +02:00
\draw [blue!50!black, fill=blue] (-2,-0.1) rectangle (-1.8,0.1);
\draw [blue!50!black, fill=blue] (-0.1,-0.1) rectangle (0.1,0.1);
\draw [blue!50!black, fill=blue] (1.3,-0.1) rectangle (1.5,0.1);
\draw [blue!50!black, fill=blue] (0.7,-0.1) rectangle (0.9,0.1);
2020-06-10 18:48:17 +02:00
\pause
% lines
\draw [cyclamen, dashed] (-1.9,0.1) -- (-1.9,1);
\draw [cyclamen, dashed] (0,0.1) -- (0,1);
\draw [cyclamen, dashed] (1.4,0.1) -- (1.4,1);
\draw [cyclamen, dashed] (0.8,0.1) -- (0.8,1);
% Gaussians
\draw[domain=-3.4:-0.4, smooth, variable=\x, cyclamen, very thick]
plot ({\x}, {exp(-(\x + 1.9)*(\x + 1.9)) + 0.1});
\draw[domain=-1.5:1.5, smooth, variable=\x, cyclamen, very thick]
plot ({\x}, {exp(-\x*\x + 0.1});
\draw[domain=-0.1:2.9, smooth, variable=\x, cyclamen, very thick]
plot ({\x}, {exp(-(\x - 1.4)*(\x - 1.4)) + 0.1});
\draw[domain=-0.7:2.3, smooth, variable=\x, cyclamen, very thick]
plot ({\x}, {exp(-(\x - 0.8)*(\x - 0.8)) + 0.1});
\pause
% sum
\draw [fill=white, white, opacity=0.5] (-3.5,0.1) rectangle (3,1.3);
\draw[domain=-3.4:3.4, smooth, variable=\x, blue, very thick]
plot ({\x}, {exp(-(\x + 1.9)*(\x + 1.9)) +
exp(-\x*\x) +
exp(-(\x - 1.4)*(\x - 1.4)) +
exp(-(\x - 0.8)*(\x - 0.8)) + 0.1});
\end{tikzpicture}
\end{center}
\setbeamercovered{transparent}
:::
::::
2020-06-07 14:32:03 +02:00
2020-06-07 00:02:20 +02:00
2020-06-07 14:32:03 +02:00
## Sample FWHM
2020-06-11 18:30:30 +02:00
Silverman's rule of thumb [@silver86]:
2020-06-07 14:32:03 +02:00
$$
2020-06-08 18:02:21 +02:00
\varepsilon = 0.88 \, S_N
2020-06-07 14:32:03 +02:00
\left( \frac{d + 2}{4}N \right)^{-1/(d + 4)}
$$
with:
- $S_N$ is the sample standard deviation
- $d$ is number of dimensions ($d = 1$)
2020-06-07 14:32:03 +02:00
. . .
2020-06-07 00:02:20 +02:00
2020-06-10 16:23:33 +02:00
Numerical minimization (Brent) for $\quad f_{\varepsilon_{\text{max}}}$
Numerical root finding (Brent) for $\quad f_{\varepsilon}(x_{\pm}) =
\frac{f_{\varepsilon_{\text{max}}}}{2}$
## Sample FWHM
![](images/kde.pdf)