# Sample statistics ## Sample statistics How to estimate sample median, mode and FWHM? . . . - \only<3>\strike{Binning data $\hence$ depends wildly on bin-width} . . . - Alternative solutions - Robust estimators - Kernel density estimation ## Sample median :::: {.columns} ::: {.column width=50% .c} $$ F(m) = \frac{1}{2} $$ \vspace{20pt} . . . - Sort points in ascending order . . . - Middle element if odd Average of the two central elements if even ::: ::: {.column width=50%} ![](images/median.pdf) ::: :::: ## Sample mode Most probable value . . . Half Sample Mode - Iteratively identify the smallest interval containing half points - Once the sample is reduced to less than three points, take average . . . \setbeamercovered{} \begin{center} \begin{tikzpicture}[remember picture] % line \draw [line width=3, ->, cyclamen] (-5,0) -- (5,0); \node [right] at (5,0) {$x$}; % points \draw [blue, fill=blue] (-4.6,-0.1) rectangle (-4.8,0.1); \draw [blue, fill=blue] (-4,-0.1) rectangle (-4.2,0.1); \draw [blue, fill=blue] (-3.3,-0.1) rectangle (-3.5,0.1); \draw [blue, fill=blue] (-2.3,-0.1) rectangle (-2.5,0.1); \draw [blue, fill=blue] (-0.6,-0.1) rectangle (-0.8,0.1); \draw [blue, fill=blue] (-0.1,-0.1) rectangle (0.1,0.1); \draw [blue, fill=blue] (1.1,-0.1) rectangle (1.3,0.1); \draw [blue, fill=blue] (2 ,-0.1) rectangle (2.2,0.1); \draw [blue, fill=blue] (2.7,-0.1) rectangle (2.9,0.1); \draw [blue, fill=blue] (4,-0.1) rectangle (4.2,0.1); % future nodes \node at (-1,-0.3) (1a) {}; \node at (3.1,0.3) (1b) {}; \node at (0.9,-0.3) (2a) {}; \node at (1.8,-0.3) (3a) {}; % result nodes \node at (2.45,-0.7) (f1) {}; \node at (2.45,0.7) (f2) {}; \end{tikzpicture} \end{center} . . . \begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (1a) rectangle (1b); \end{tikzpicture} \end{center} . . . \begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (2a) rectangle (1b); \end{tikzpicture} \end{center} . . . \begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [orange, fill=orange, opacity=0.5] (3a) rectangle (1b); \end{tikzpicture} \end{center} . . . \begin{center} \begin{tikzpicture}[remember picture, overlay] % region \draw [cyclamen, ultra thick] (f1) -- (f2); \end{tikzpicture} \end{center} ## Sample FWHM $$ \text{FWHM} = x_+ - x_- \with L(x_{\pm}) = \frac{L_{\text{max}}}{2} $$ \setbeamercovered{transparent} . . . Kernel Density Estimation - empirical PDF construction: $$ f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N G \left( \frac{x-x_i}{\varepsilon} \right) $$ The parameter $\varepsilon$ controls the strength of the smoothing ## Sample FWHM Silverman's rule of thumb: $$ f_\varepsilon(x) = \frac{1}{N\varepsilon} \sum_{i = 1}^N G \left( \frac{x-x_i}{\varepsilon} \right) \with \varepsilon = 0.88 \, S_N \left( \frac{d + 2}{4}N \right)^{-1/(d + 4)} $$ with: - $S_N$ is the sample standard deviation - $d$ is number of dimensions ($d = 1$) . . . Numerical minimization (Brent) for $\quad f_{\varepsilon_{\text{max}}}$ Numerical root finding (Brent) for $\quad f_{\varepsilon}(x_{\pm}) = \frac{f_{\varepsilon_{\text{max}}}}{2}$ ## Sample FWHM ![](images/kde.pdf)