435 lines
17 KiB
Markdown
435 lines
17 KiB
Markdown
# Exercise 3 {#sec:3}
|
||
|
||
## Meson decay events generation
|
||
|
||
A number of $N = 50000$ points on the unit sphere, each representing a
|
||
particle detection event, is to be generated according to then angular
|
||
probability distribution function $F$:
|
||
\begin{align*}
|
||
F (\theta, \phi) = &\frac{3}{4 \pi} \Bigg[
|
||
\frac{1}{2} (1 - \alpha) + \frac{1}{2} (3\alpha - 1) \cos^2(\theta) \\
|
||
&- \beta \sin^2(\theta) \cos(2 \phi)
|
||
- \gamma \sqrt{2} \sin(2 \theta) \cos(\phi) \Bigg]
|
||
\end{align*}
|
||
|
||
where $\theta$ and $\phi$ are, respectively, the polar and azimuthal angles, and
|
||
$$
|
||
\alpha_0 = 0.65 \et \beta_0 = 0.06 \et \gamma_0 = -0.18
|
||
$$
|
||
|
||
To generate the points a *hit-miss* method was employed:
|
||
|
||
1. generate a point $(\theta , \phi)$ uniformly on a unit sphere,
|
||
2. generate a third value $Y$ uniformly in $[0, 1]$,
|
||
3. if @eq:requirement is satisfied save the point,
|
||
4. repeat from 1.
|
||
$$
|
||
Y < F(\theta, \phi)
|
||
$$ {#eq:requirement}
|
||
|
||
To increase the efficiency of the procedure, the $Y$ values were actually
|
||
generated in $[0, 0.2]$, since $\max F = 0.179$ for the given parameters
|
||
(by generating the numbers in the range $[0 ; 1]$, all the points with
|
||
$Y_{\theta, \phi} > 0.179$ would have been rejected with the same
|
||
probability, namely 1).
|
||
|
||
While $\phi$ can simply be generated uniformly between 0 and $2 \pi$, for
|
||
$\theta$ one has to be more careful: by generating evenly distributed numbers
|
||
between 0 and $\pi$, the points turn out to cluster on the poles. The point
|
||
$(\theta,\phi)$ is *not* uniform on the sphere but is uniform on a cylindrical
|
||
projection of it. (The difference can be appreciated in @fig:compare).
|
||
|
||
The proper distribution of $\theta$ is obtained by applying a transform
|
||
to a variable $x$ with uniform distribution in $[0, 1)$,
|
||
easily generated by the GSL function `gsl_rng_uniform()`.
|
||
|
||
<div id="fig:compare">
|
||
![Uniformly distributed points with $\theta$ evenly distributed between
|
||
0 and $\pi$.](images/3-histo-i-u.pdf){width=50%}
|
||
![Points uniformly distributed on a spherical
|
||
surface.](images/3-histo-p-u.pdf){width=50%}
|
||
|
||
![Sample generated according to $F$ with $\theta$ evenly distributed between
|
||
0 and $\pi$.](images/3-histo-i-F.pdf){width=50%}
|
||
![Sample generated according to $F$ with $\theta$ properly
|
||
distributed.](images/3-histo-p-F.pdf){width=50%}
|
||
|
||
Examples of samples. On the left, points with $\theta$ evenly distributed
|
||
between 0 and $\pi$; on the right, points with $\theta$ properly distributed.
|
||
</div>
|
||
|
||
The transformation can be found by imposing the angular PDF to be a constant:
|
||
\begin{align*}
|
||
\frac{d^2 P}{d \Omega^2} = \text{const} = \frac{1}{4 \pi}
|
||
\hspace{20pt} &\Longrightarrow \hspace{20pt}
|
||
d^2P = \frac{1}{4 \pi} \, d \Omega^2 = \frac{1}{4 \pi} \,
|
||
d \phi \sin(\theta) d \theta
|
||
\\
|
||
&\Longrightarrow \hspace{20pt}
|
||
\frac{dP}{d \theta} = \int_0^{2 \pi} d \phi \, \frac{1}{4 \pi} \sin(\theta)=
|
||
\frac{1}{4 \pi} \sin(\theta) 2 \pi = \frac{1}{2} \sin(\theta)
|
||
\end{align*}
|
||
\begin{align*}
|
||
\theta = \theta (x)
|
||
\hspace{20pt} &\Longrightarrow \hspace{20pt}
|
||
\frac{dP}{d \theta} = \frac{dP}{dx} \cdot \left| \frac{dx}{d \theta} \right|
|
||
= \left. \frac{dP}{dx} \middle/ \, \left| \frac{d \theta}{dx} \right| \right.
|
||
\\
|
||
&\Longrightarrow \hspace{20pt}
|
||
\frac{1}{2} \sin(\theta) = \left. 1 \middle/ \, \left|
|
||
\frac{d \theta}{dx} \right| \right.
|
||
\end{align*}
|
||
|
||
If $\theta$ is chosen to grew together with $x$, then the absolute value can be
|
||
omitted:
|
||
\begin{align*}
|
||
\frac{d \theta}{dx} = \frac{2}{\sin(\theta)}
|
||
\hspace{20pt} &\Longrightarrow \hspace{20pt}
|
||
d \theta \sin(\theta) = 2 dx
|
||
\\
|
||
&\Longrightarrow \hspace{20pt}
|
||
- \cos (\theta') |_{0}^{\theta} = 2[x(\theta) - x(0)] = 2(x - 0) = 2x
|
||
\\
|
||
&\Longrightarrow \hspace{20pt}
|
||
- \cos(\theta) + 1 = 2x
|
||
\\
|
||
&\Longrightarrow \hspace{20pt}
|
||
\theta = \text{acos} (1 -2x)
|
||
\end{align*}
|
||
|
||
Since 1 is not in $[0 , 1)$, $\theta = \pi$ can't be generated. Being it just
|
||
a single point, the effect of this omission is negligible.
|
||
|
||
\vspace{30pt}
|
||
|
||
|
||
## Parameters estimation
|
||
|
||
The sample is now used to estimate the parameters $\alpha$, $\beta$ and
|
||
$\gamma$ of the angular distribution $F$. The correct set will be referred to
|
||
as {$\alpha_0$, $\beta_0$, $\gamma_0$}.
|
||
|
||
|
||
### Maximum Likelihood method {#sec:MLM}
|
||
|
||
The Maximum Likelihood method (MLM) is based on the observation that the best
|
||
estimate {$\bar{\alpha}$, $\bar{\beta}$, $\bar{\gamma}$} of the parameters
|
||
{$\alpha$, $\beta$, $\gamma$} should maximize the Likelihood function $L$
|
||
defined in @eq:Like0, where the index $i$ runs over the sample:
|
||
$$
|
||
L = \prod_i F(\theta_i, \phi_i) = \prod_i F_i
|
||
$$ {#eq:Like0}
|
||
|
||
The reason is that the points were generated according to the probability
|
||
distribution function $F$ with parameters $\alpha_0$, $\beta_0$ and $\gamma_0$,
|
||
thus the probability $P$ of sampling this special sample is, by definition, the
|
||
product of their probabilities $F_i$ and it must be maximum for {$\alpha$,
|
||
$\beta$, $\gamma$} = {$\alpha_0$, $\beta_0$, $\gamma_0$}. Thus, by viewing $F_i$
|
||
as a function of the parameters, by maximizing $P = L$ with respect
|
||
to $\alpha$, $\beta$ and $\gamma$, a good estimate should be found.
|
||
Instead of actually maximising $L$, the function $-\ln(L)$
|
||
was minimised, as minimisation methods are usually described in literature
|
||
and the logarithm simplifies the math:
|
||
$$
|
||
L = \prod_i F_i \thus - \ln(L) = - \sum_{i} \ln(F_i)
|
||
$$ {#eq:Like}
|
||
|
||
|
||
The problem of multidimensional minimisation was tackled using the GSL library
|
||
`gsl_multimin.h`, which implements a variety of methods, including the
|
||
conjugate gradient Fletcher-Reeves algorithm, used in the solution, which requires
|
||
the implementation of $-\ln(L)$ and its gradient.
|
||
|
||
To minimise a function $f$, given an initial guess point $x_0$, the gradient
|
||
$\nabla f$ is used to find the initial direction $v_0$ (the steepest descent)
|
||
along which a line minimisation is performed: the minimum $x_1$ occurs where
|
||
the directional derivative along $v_0$ is zero:
|
||
$$
|
||
\frac{\partial f}{\partial v_0} = \nabla f \cdot v_0 = 0
|
||
$$
|
||
|
||
or, equivalently, at the point where the gradient is orthogonal to $v_0$.
|
||
After this first step, the following procedure is iterated:
|
||
|
||
1. find the steepest descent $v_n$ at $x_n$,
|
||
2. compute the *conjugate* direction: $w_n = v_n + \beta_n w_{n-1}$,
|
||
3. perform a line minimisation along $w_n$ and update the minimum $x_{n+1}$,
|
||
|
||
Different formulas for $\beta_n$ have been developed, all equivalent for a
|
||
quadratic function, but not for nonlinear optimization: in such instances, the
|
||
best formula is a matter of heuristics [@painless94]. In this case, $\beta_n$ is
|
||
given by the Fletcher-Reeves formula:
|
||
$$
|
||
\beta_n = \frac{\|\nabla f(x_n)\|^2}{\|\nabla f(x_{n-1})\|^2}
|
||
$$
|
||
|
||
The accuracy of each line minimisation is controlled with the parameter `tol`,
|
||
meaning:
|
||
$$
|
||
w \cdot \nabla f < \text{tol} \, \|w\| \, \|\nabla f\|
|
||
$$
|
||
|
||
The minimisation is quite unstable and this forced the starting point to be
|
||
taken very close to the solution, namely:
|
||
$$
|
||
\alpha_{sp} = 0.79 \et
|
||
\beta_{sp} = 0.02 \et
|
||
\gamma_{sp} = -0.17
|
||
$$
|
||
|
||
The overall minimisation ends when the gradient module is smaller than $10^{-4}$.
|
||
The program took 25 of the above iterations to reach the result shown in @eq:Like_res.
|
||
|
||
As regards the error estimation, the Cramér-Rao bound states that the covariance
|
||
matrix of parameters estimated by MLM is greater than the inverse of the
|
||
Hessian matrix of $-\log(L)$ at the minimum. Thus, the Hessian matrix was
|
||
computed analytically and inverted by a Cholesky decomposition, which states
|
||
that every positive definite symmetric square matrix $H$ can be written as the
|
||
product of a lower triangular matrix $\Delta$ and its transpose $\Delta^T$:
|
||
$$
|
||
H = \Delta \cdot \Delta^T
|
||
$$
|
||
|
||
The Hessian matrix is, indeed, both symmetric and positive definite, when
|
||
computed in a minimum. Once decomposed, the inverse is given by
|
||
$$
|
||
H^{-1} = (\Delta \cdot \Delta^T)^{-1} = (\Delta^{-1})^T \cdot \Delta^{-1}
|
||
$$
|
||
|
||
where the inverse of $\Delta$ is easily computed, since it's a triangular
|
||
matrix.
|
||
|
||
The GSL library provides two functions `gsl_linalg_cholesky_decomp()` and
|
||
`gsl_linalg_cholesky_invert()` to decompose and invert in-place a matrix.
|
||
The result is shown below:
|
||
$$
|
||
\bar{\alpha_L} = 0.6541 \et
|
||
\bar{\beta_L} = 0.06125 \et
|
||
\bar{\gamma_L} = -0.1763
|
||
$$ {#eq:Like_res}
|
||
|
||
$$
|
||
\text{cov}_L = \begin{pmatrix}
|
||
9.225 \cdot 10^{-6} & -1.785 \cdot 10^{-6} & 2.41 \cdot 10^{-7} \\
|
||
-1.785 \cdot 10^{-6} & 4.413 \cdot 10^{-6} & 1.58 \cdot 10^{-6} \\
|
||
2.41 \cdot 10^{-7} & 1.58 \cdot 10^{-6} & 2.602 \cdot 10^{-6}
|
||
\end{pmatrix}
|
||
$$ {#eq:Like_cov}
|
||
|
||
Hence:
|
||
\begin{align*}
|
||
\alpha_L &= 0.654 \pm 0.003 \\
|
||
\beta_L &= 0.0613 \pm 0.0021 \\
|
||
\gamma_L &= -0.1763 \pm 0.0016
|
||
\end{align*}
|
||
|
||
See @sec:res_comp for results compatibility.
|
||
|
||
|
||
### Least squares estimation
|
||
|
||
Another method that can be used to estimate {$\alpha$, $\beta$, $\gamma$} are
|
||
the least squares (LSQ), which is a multidimensional minimisation
|
||
specialised to the case of sum of squared functions called residuals.
|
||
In general, the residuals can be anything but are usually interpreted as the
|
||
difference between an expected (fit) and an observed value. The least squares
|
||
then correspond to the expected values that best fit the observations.
|
||
|
||
To apply the LSQ method, the data must be properly binned, meaning that each
|
||
bin should contain a significant number of events (say greater than four or
|
||
five): in this case, 30 bins for $\theta$ and 60 for $\phi$ turned out
|
||
to be satisfactory. The expected values $E_{\theta, \phi}$ were given as the
|
||
product of $N$, $F$ computed in the geometric center of the bin and the solid
|
||
angle enclosed by the bin itself, namely:
|
||
|
||
$$
|
||
E_{\theta, \phi} = N F(\theta, \phi) \, \Delta \theta \, \Delta \phi \,
|
||
\sin{\theta}
|
||
$$
|
||
|
||
Once the data are binned, the number of events in each bin plays the role of the
|
||
observed value. Then, the $\chi^2$, defined as follow, must be minimized with
|
||
respect to the three parameters to be estimated:
|
||
$$
|
||
\chi^2 = \sum_i f_i^2 = \|\vec f\|^2 \with f_i = \frac{O_i - E_i}{\sqrt{E_i}}
|
||
$$
|
||
|
||
where the sum runs over the bins, $O_i$ are the observed values and $E_i$ the
|
||
expected ones. Besides the more generic `gsl_multimin.h` library, GSL provides a
|
||
special one for least square minimisation, called `gsl_multifit_nlinear.h`. A
|
||
trust region method was used to minimize the $\chi^2$.
|
||
In such methods, the $\chi^2$, its gradient and its Hessian matrix are computed
|
||
in a series of points in the parameters space till the gradient and the matrix
|
||
reach given proper values, meaning that the minimum has been well approached,
|
||
where "well" is related to those values.
|
||
If {$x_1 \dots x_p$} are the parameters which must be estimated, first $\chi^2$
|
||
is computed in a point $\vec{x_k}$, then its value in a point $\vec{x}_k +
|
||
\vec{\delta}$ is modelled by a function $m_k (\vec{\delta})$ which is the
|
||
second order Taylor expansion around the point $\vec{x}_k$, namely:
|
||
$$
|
||
\chi^2 (\vec{x}_k + \vec{\delta}) \sim m_k (\vec{\delta}) =
|
||
\chi^2 (\vec{x}_k) + \nabla_k^T \vec{\delta} + \frac{1}{2}
|
||
\vec{\delta}^T H_k \vec{\delta}
|
||
$$
|
||
|
||
where $\nabla_k$ and $H_k$ are the gradient and the Hessian matrix of $\chi^2$
|
||
in the point $\vec{x}_k$ and the superscript $T$ stands for the transpose. The
|
||
initial problem is reduced into the so called trust-region subproblem (TRS),
|
||
which is the minimisation of $m_k(\vec{\delta})$ in a region where
|
||
$m_k(\vec{\delta})$ should be a good approximation of $\chi^2 (\vec{x}_k +
|
||
\vec{\delta})$, given by the condition:
|
||
$$
|
||
\|D_k \vec\delta\| < \Delta_k
|
||
$$
|
||
|
||
If $D_k = I$, the region is a hypersphere of radius $\Delta_k$ centered at
|
||
$\vec{x}_k$. In case one or more parameters have very different scales, the
|
||
region should be deformed according to a proper $D_k$.
|
||
|
||
Given an initial point $x_0$, the radius $\Delta_0$, the scale matrix $D_0$
|
||
and step $\vec\delta_0$, the LSQ algorithm consist in the following iteration:
|
||
|
||
1. construct the function $m_k(\vec\delta)$,
|
||
2. solve the TRS and find $x_k + \vec\delta_k$ which corresponds to the
|
||
plausible minimum,
|
||
3. check whether the solution actually decreases $\chi^2$:
|
||
1. if positive and converged (see below), stop;
|
||
1. if positive but not converged, it means that $m_k$ still well
|
||
approximates $\chi^2$ in the trust region which is therefore enlarged
|
||
by increasing the radius $\Delta_{k+1} = \alpha\Delta_k$ for a given
|
||
$\alpha$ and the position of the central point is shifted: $\vec
|
||
x_{k+1} = \vec x_k + \vec \delta_k$;
|
||
2. if negative, it means that $m_k$ does not approximate properly
|
||
$\chi^2$ in the trust region which is therefore decreased by reducing
|
||
the radius $\Delta_{k+1} = \Delta_k/\beta$ for a given $\beta$.
|
||
4. Repeat.
|
||
|
||
This method is advantageous compared to a general minimisation because
|
||
the TRS usually amounts to solving a linear system. There are many algorithms
|
||
to solve the problem, in this case the Levenberg-Marquardt was used. It is
|
||
based on a theorem which proves the existence of a number $\mu_k$ such that:
|
||
$$
|
||
\mu_k (\Delta_k - \|D_k \vec\delta_k\|) = 0 \et
|
||
(H_k + \mu_k D_k^TD_k) \vec\delta_k = -\nabla_k
|
||
$$
|
||
|
||
Using the approximation[^2] $H\approx J^TJ$, obtained by computing the Hessian
|
||
of the first-order Taylor expansion of $\chi^2$, $\vec\delta_k$ can
|
||
be found by solving the system:
|
||
$$
|
||
\begin{cases}
|
||
J_k \vec{\delta_k} + \vec{f_k} = 0 \\
|
||
\sqrt{\mu_k} D_k \vec{\delta_k} = 0
|
||
\end{cases}
|
||
$$
|
||
|
||
For more informations, see [@Lou05].
|
||
|
||
[^2]: Here $J_{ij} = \partial f_i/\partial x_j$ is the Jacobian matrix of the
|
||
vector-valued function $\vec f(\vec x)$.
|
||
|
||
The algorithm terminates if one of the following convergence conditions is
|
||
satisfied:
|
||
|
||
1. $|\delta_i| \leq \text{xtol} (|x_i| + \text{xtol})$ for every component
|
||
of $\vec \delta$.
|
||
2. $\max_i |\nabla_i \cdot \max(x_i, 1)| \leq \text{gtol}\cdot \max(\chi^2, 1)$
|
||
3. $\|\vec f(\vec x+ \vec\delta) - \vec f(\vec x)|| \leq \text{ftol}
|
||
\cdot \max(\|\vec f(\vec x)\|, 1)$
|
||
|
||
Where xtol, gtol and ftol are tolerance values all been set to \num{1e-8}. The
|
||
program converged in 7 iterations giving the results below. The covariance of
|
||
the parameters can again been estimated through the Hessian matrix at the
|
||
minimum. The following results were obtained:
|
||
$$
|
||
\bar{\alpha_{\chi}} = 0.6537 \et
|
||
\bar{\beta_{\chi}} = 0.05972 \et
|
||
\bar{\gamma_{\chi}} = -0.1736
|
||
$$ {#eq:lsq-res}
|
||
|
||
$$
|
||
\text{cov}_{\chi} = \begin{pmatrix}
|
||
9.244 \cdot 10^{-6} & -1.66 \cdot 10^{-6} & 2.383 \cdot 10^{-7} \\
|
||
-1.66 \cdot 10^{-6} & 4.495 \cdot 10^{-6} & 1.505 \cdot 10^{-6} \\
|
||
2.383 \cdot 10^{-7} & 1.505 \cdot 10^{-6} & 2.681 \cdot 10^{-6}
|
||
\end{pmatrix}
|
||
$$
|
||
|
||
Hence:
|
||
\begin{align*}
|
||
&\alpha_{\chi} = 0.654 \pm 0.003 \\
|
||
&\beta_{\chi} = 0.0597 \pm 0.0021 \\
|
||
&\gamma_{\chi} = -0.1736 \pm 0.0016
|
||
\end{align*}
|
||
|
||
See @sec:res_comp for results compatibility.
|
||
|
||
|
||
### Results compatibility {#sec:res_comp}
|
||
|
||
In order to compare the values $x_L$ and $x_{\chi}$ obtained from both methods
|
||
with the correct ones ({$\alpha_0$, $\beta_0$, $\gamma_0$}), the following
|
||
compatibility $t$-test was applied:
|
||
|
||
$$
|
||
p = 1 - \text{erf}\left(\frac{t}{\sqrt{2}}\right)\ \with
|
||
t = \frac{x_i - x_0}{\sqrt{\sigma_{x_i}^2 + \sigma_{x_0}^2}}
|
||
= \frac{x_i - x_0}{\sigma_{x_i}}
|
||
$$
|
||
|
||
where $i$ stands either for the MLM or LSQ parameters and $\sigma_{x_i}$ is the
|
||
uncertainty of the value $x_i$. At 95% confidence level, the values are
|
||
compatible if $p > 0.05$.
|
||
|
||
\vspace{30pt}
|
||
|
||
Likelihood results:
|
||
|
||
----------------------------
|
||
par $p$-value
|
||
------------ ---------------
|
||
$α_L$ 0.18
|
||
|
||
$β_L$ 0.55
|
||
|
||
$γ_L$ 0.023
|
||
----------------------------
|
||
|
||
Table: Likelihood results compatibility.
|
||
|
||
$\chi^2$ results:
|
||
|
||
----------------------------
|
||
par $p$-value
|
||
------------ ---------------
|
||
$α_χ$ 0.22
|
||
|
||
$β_χ$ 0.89
|
||
|
||
$γ_χ$ 0.0001
|
||
----------------------------
|
||
|
||
Table: $\chi^2$ results compatibility.
|
||
|
||
It can be concluded that, with both methods, the parameters $\alpha$ and $\beta$
|
||
were recovered successfully, while $\gamma$ is incompatible. However, the
|
||
covariance was estimated using the Cramér-Rao bound, so the errors may be
|
||
underestimated, which must be the case for $\gamma$.
|
||
|
||
Since two different methods similarly underestimated the true value of
|
||
$\gamma$, it was suspected the Monte Carlo simulation was faulty. This
|
||
phenomenon was observed frequently when generating multiple samples, so it
|
||
can't be attributed to statistical fluctuations in that particular sample.
|
||
The issue remains unsolved as no explanation was found.
|
||
|
||
|
||
## Isotropic hypothesis testing
|
||
|
||
What if the probability distribution function were isotropic?
|
||
Is this hypothesis compatible with the observation?
|
||
|
||
If $F$ is isotropic, $\alpha_I$, $\beta_I$ and $\gamma_I$ would be $1/3$ , 0,
|
||
and 0 respectively, since this gives $F_I = 1/{4 \pi}$. The $t$-test gives a
|
||
$p$-value approximately zero for all the three parameters, meaning that there
|
||
is no compatibility at all with this hypothesis.
|