ex-3: review

This commit is contained in:
Giù Marcer 2020-06-03 10:02:34 +02:00 committed by rnhmjoj
parent 0d3498ff5c
commit a7d923620a

View File

@ -2,30 +2,28 @@
## Meson decay events generation ## Meson decay events generation
A number of $N = 50000$ points on the unit sphere, each representing a A number of $N = 50000$ points, each representing a particle detection event,
particle detection event, is to be generated according to then angular is to be generated on the unit sphere according to the angular probability
probability distribution function $F$: distribution function $F$:
\begin{align*} \begin{align*}
F (\theta, \phi) = &\frac{3}{4 \pi} \Bigg[ F (\theta, \phi) = &\frac{3}{4 \pi} \Bigg[
\frac{1}{2} (1 - \alpha) + \frac{1}{2} (3\alpha - 1) \cos^2(\theta) \\ \frac{1}{2} (1 - \alpha) + \frac{1}{2} (3\alpha - 1) \cos^2(\theta) \\
&- \beta \sin^2(\theta) \cos(2 \phi) &- \beta \sin^2(\theta) \cos(2 \phi)
- \gamma \sqrt{2} \sin(2 \theta) \cos(\phi) \Bigg] - \gamma \sqrt{2} \sin(2 \theta) \cos(\phi) \Bigg]
\end{align*} \end{align*}
where $\theta$ and $\phi$ are, respectively, the polar and azimuthal angles, and where $\theta$ and $\phi$ are, respectively, the polar and azimuthal angles, and
$$ $$
\alpha_0 = 0.65 \et \beta_0 = 0.06 \et \gamma_0 = -0.18 \alpha_0 = 0.65 \et \beta_0 = 0.06 \et \gamma_0 = -0.18
$$ $$
To generate the points, a *hit-miss* method was employed:
To generate the points a *hit-miss* method was employed:
1. generate a point $(\theta , \phi)$ uniformly on a unit sphere, 1. generate a point $(\theta , \phi)$ uniformly on a unit sphere,
2. generate a third value $Y$ uniformly in $[0, 1]$, 2. generate a third value $Y$ uniformly in $[0, 1]$,
3. if @eq:requirement is satisfied save the point, 3. if @eq:requirement is satisfied, save the point,
4. repeat from 1.
$$ $$
Y < F(\theta, \phi) Y < F(\theta, \phi)
$$ {#eq:requirement} $$ {#eq:requirement}
4. repeat from 1.
To increase the efficiency of the procedure, the $Y$ values were actually To increase the efficiency of the procedure, the $Y$ values were actually
generated in $[0, 0.2]$, since $\max F = 0.179$ for the given parameters generated in $[0, 0.2]$, since $\max F = 0.179$ for the given parameters
@ -37,7 +35,7 @@ While $\phi$ can simply be generated uniformly between 0 and $2 \pi$, for
$\theta$ one has to be more careful: by generating evenly distributed numbers $\theta$ one has to be more careful: by generating evenly distributed numbers
between 0 and $\pi$, the points turn out to cluster on the poles. The point between 0 and $\pi$, the points turn out to cluster on the poles. The point
$(\theta,\phi)$ is *not* uniform on the sphere but is uniform on a cylindrical $(\theta,\phi)$ is *not* uniform on the sphere but is uniform on a cylindrical
projection of it. (The difference can be appreciated in @fig:compare). projection of it (the difference can be appreciated in @fig:compare).
The proper distribution of $\theta$ is obtained by applying a transform The proper distribution of $\theta$ is obtained by applying a transform
to a variable $x$ with uniform distribution in $[0, 1)$, to a variable $x$ with uniform distribution in $[0, 1)$,
@ -79,7 +77,6 @@ The transformation can be found by imposing the angular PDF to be a constant:
\frac{1}{2} \sin(\theta) = \left. 1 \middle/ \, \left| \frac{1}{2} \sin(\theta) = \left. 1 \middle/ \, \left|
\frac{d \theta}{dx} \right| \right. \frac{d \theta}{dx} \right| \right.
\end{align*} \end{align*}
If $\theta$ is chosen to grew together with $x$, then the absolute value can be If $\theta$ is chosen to grew together with $x$, then the absolute value can be
omitted: omitted:
\begin{align*} \begin{align*}
@ -96,12 +93,9 @@ omitted:
&\Longrightarrow \hspace{20pt} &\Longrightarrow \hspace{20pt}
\theta = \text{acos} (1 -2x) \theta = \text{acos} (1 -2x)
\end{align*} \end{align*}
Since 1 is not in $[0 , 1)$, $\theta = \pi$ can't be generated. Being it just Since 1 is not in $[0 , 1)$, $\theta = \pi$ can't be generated. Being it just
a single point, the effect of this omission is negligible. a single point, the effect of this omission is negligible.
\vspace{30pt}
## Parameters estimation ## Parameters estimation
@ -119,7 +113,6 @@ defined in @eq:Like0, where the index $i$ runs over the sample:
$$ $$
L = \prod_i F(\theta_i, \phi_i) = \prod_i F_i L = \prod_i F(\theta_i, \phi_i) = \prod_i F_i
$$ {#eq:Like0} $$ {#eq:Like0}
The reason is that the points were generated according to the probability The reason is that the points were generated according to the probability
distribution function $F$ with parameters $\alpha_0$, $\beta_0$ and $\gamma_0$, distribution function $F$ with parameters $\alpha_0$, $\beta_0$ and $\gamma_0$,
thus the probability $P$ of sampling this special sample is, by definition, the thus the probability $P$ of sampling this special sample is, by definition, the
@ -133,12 +126,10 @@ and the logarithm simplifies the math:
$$ $$
L = \prod_i F_i \thus - \ln(L) = - \sum_{i} \ln(F_i) L = \prod_i F_i \thus - \ln(L) = - \sum_{i} \ln(F_i)
$$ {#eq:Like} $$ {#eq:Like}
The problem of multidimensional minimisation was tackled using the GSL library The problem of multidimensional minimisation was tackled using the GSL library
`gsl_multimin.h`, which implements a variety of methods, including the `gsl_multimin.h`, which implements a variety of methods, including the
conjugate gradient Fletcher-Reeves algorithm, used in the solution, which requires conjugate gradient Fletcher-Reeves algorithm, used in the solution, which requires
the implementation of $-\ln(L)$ and its gradient. the implementation of $-\ln(L)$ and its gradient. It works as follows.
To minimise a function $f$, given an initial guess point $x_0$, the gradient To minimise a function $f$, given an initial guess point $x_0$, the gradient
$\nabla f$ is used to find the initial direction $v_0$ (the steepest descent) $\nabla f$ is used to find the initial direction $v_0$ (the steepest descent)
@ -147,7 +138,6 @@ the directional derivative along $v_0$ is zero:
$$ $$
\frac{\partial f}{\partial v_0} = \nabla f \cdot v_0 = 0 \frac{\partial f}{\partial v_0} = \nabla f \cdot v_0 = 0
$$ $$
or, equivalently, at the point where the gradient is orthogonal to $v_0$. or, equivalently, at the point where the gradient is orthogonal to $v_0$.
After this first step, the following procedure is iterated: After this first step, the following procedure is iterated:
@ -162,13 +152,12 @@ given by the Fletcher-Reeves formula:
$$ $$
\beta_n = \frac{\|\nabla f(x_n)\|^2}{\|\nabla f(x_{n-1})\|^2} \beta_n = \frac{\|\nabla f(x_n)\|^2}{\|\nabla f(x_{n-1})\|^2}
$$ $$
The accuracy of each line minimisation is controlled with the parameter `tol`, The accuracy of each line minimisation is controlled with the parameter `tol`,
meaning: meaning:
$$ $$
w \cdot \nabla f < \text{tol} \, \|w\| \, \|\nabla f\| w \cdot \nabla f < \text{tol} \, \|w\| \, \|\nabla f\|
$$ $$
which was set to its default value 0.1.
The minimisation is quite unstable and this forced the starting point to be The minimisation is quite unstable and this forced the starting point to be
taken very close to the solution, namely: taken very close to the solution, namely:
$$ $$
@ -176,8 +165,7 @@ $$
\beta_{sp} = 0.02 \et \beta_{sp} = 0.02 \et
\gamma_{sp} = -0.17 \gamma_{sp} = -0.17
$$ $$
The overall minimisation ends when the gradient module is smaller than $10^{-5}$.
The overall minimisation ends when the gradient module is smaller than $10^{-4}$.
The program took 25 of the above iterations to reach the result shown in @eq:Like_res. The program took 25 of the above iterations to reach the result shown in @eq:Like_res.
As regards the error estimation, the Cramér-Rao bound states that the covariance As regards the error estimation, the Cramér-Rao bound states that the covariance
@ -189,13 +177,11 @@ product of a lower triangular matrix $\Delta$ and its transpose $\Delta^T$:
$$ $$
H = \Delta \cdot \Delta^T H = \Delta \cdot \Delta^T
$$ $$
The Hessian matrix is, indeed, both symmetric and positive definite, when The Hessian matrix is, indeed, both symmetric and positive definite, when
computed in a minimum. Once decomposed, the inverse is given by computed in a minimum. Once decomposed, the inverse is given by
$$ $$
H^{-1} = (\Delta \cdot \Delta^T)^{-1} = (\Delta^{-1})^T \cdot \Delta^{-1} H^{-1} = (\Delta \cdot \Delta^T)^{-1} = (\Delta^{-1})^T \cdot \Delta^{-1}
$$ $$
where the inverse of $\Delta$ is easily computed, since it's a triangular where the inverse of $\Delta$ is easily computed, since it's a triangular
matrix. matrix.
@ -207,7 +193,6 @@ $$
\bar{\beta_L} = 0.06125 \et \bar{\beta_L} = 0.06125 \et
\bar{\gamma_L} = -0.1763 \bar{\gamma_L} = -0.1763
$$ {#eq:Like_res} $$ {#eq:Like_res}
$$ $$
\text{cov}_L = \begin{pmatrix} \text{cov}_L = \begin{pmatrix}
9.225 \cdot 10^{-6} & -1.785 \cdot 10^{-6} & 2.41 \cdot 10^{-7} \\ 9.225 \cdot 10^{-6} & -1.785 \cdot 10^{-6} & 2.41 \cdot 10^{-7} \\
@ -215,14 +200,12 @@ $$
2.41 \cdot 10^{-7} & 1.58 \cdot 10^{-6} & 2.602 \cdot 10^{-6} 2.41 \cdot 10^{-7} & 1.58 \cdot 10^{-6} & 2.602 \cdot 10^{-6}
\end{pmatrix} \end{pmatrix}
$$ {#eq:Like_cov} $$ {#eq:Like_cov}
Hence: Hence:
\begin{align*} \begin{align*}
\alpha_L &= 0.654 \pm 0.003 \\ \alpha_L &= 0.654 \pm 0.003 \\
\beta_L &= 0.0613 \pm 0.0021 \\ \beta_L &= 0.0613 \pm 0.0021 \\
\gamma_L &= -0.1763 \pm 0.0016 \gamma_L &= -0.1763 \pm 0.0016
\end{align*} \end{align*}
See @sec:res_comp for results compatibility. See @sec:res_comp for results compatibility.
@ -241,19 +224,16 @@ five): in this case, 30 bins for $\theta$ and 60 for $\phi$ turned out
to be satisfactory. The expected values $E_{\theta, \phi}$ were given as the to be satisfactory. The expected values $E_{\theta, \phi}$ were given as the
product of $N$, $F$ computed in the geometric center of the bin and the solid product of $N$, $F$ computed in the geometric center of the bin and the solid
angle enclosed by the bin itself, namely: angle enclosed by the bin itself, namely:
$$ $$
E_{\theta, \phi} = N F(\theta, \phi) \, \Delta \theta \, \Delta \phi \, E_{\theta, \phi} = N F(\theta, \phi) \, \Delta \theta \, \Delta \phi \,
\sin{\theta} \sin{\theta}
$$ $$
Once the data are binned, the number of events in each bin plays the role of the Once the data are binned, the number of events in each bin plays the role of the
observed value. Then, the $\chi^2$, defined as follow, must be minimized with observed value. Then, the $\chi^2$, defined as follow, must be minimized with
respect to the three parameters to be estimated: respect to the three parameters to be estimated:
$$ $$
\chi^2 = \sum_i f_i^2 = \|\vec f\|^2 \with f_i = \frac{O_i - E_i}{\sqrt{E_i}} \chi^2 = \sum_i f_i^2 = \|\vec f\|^2 \with f_i = \frac{O_i - E_i}{\sqrt{E_i}}
$$ $$
where the sum runs over the bins, $O_i$ are the observed values and $E_i$ the where the sum runs over the bins, $O_i$ are the observed values and $E_i$ the
expected ones. Besides the more generic `gsl_multimin.h` library, GSL provides a expected ones. Besides the more generic `gsl_multimin.h` library, GSL provides a
special one for least square minimisation, called `gsl_multifit_nlinear.h`. A special one for least square minimisation, called `gsl_multifit_nlinear.h`. A
@ -261,7 +241,8 @@ trust region method was used to minimize the $\chi^2$.
In such methods, the $\chi^2$, its gradient and its Hessian matrix are computed In such methods, the $\chi^2$, its gradient and its Hessian matrix are computed
in a series of points in the parameters space till the gradient and the matrix in a series of points in the parameters space till the gradient and the matrix
reach given proper values, meaning that the minimum has been well approached, reach given proper values, meaning that the minimum has been well approached,
where "well" is related to those values. where "well" is related to those values. They work as follows.
If {$x_1 \dots x_p$} are the parameters which must be estimated, first $\chi^2$ If {$x_1 \dots x_p$} are the parameters which must be estimated, first $\chi^2$
is computed in a point $\vec{x_k}$, then its value in a point $\vec{x}_k + is computed in a point $\vec{x_k}$, then its value in a point $\vec{x}_k +
\vec{\delta}$ is modelled by a function $m_k (\vec{\delta})$ which is the \vec{\delta}$ is modelled by a function $m_k (\vec{\delta})$ which is the
@ -271,7 +252,6 @@ $$
\chi^2 (\vec{x}_k) + \nabla_k^T \vec{\delta} + \frac{1}{2} \chi^2 (\vec{x}_k) + \nabla_k^T \vec{\delta} + \frac{1}{2}
\vec{\delta}^T H_k \vec{\delta} \vec{\delta}^T H_k \vec{\delta}
$$ $$
where $\nabla_k$ and $H_k$ are the gradient and the Hessian matrix of $\chi^2$ where $\nabla_k$ and $H_k$ are the gradient and the Hessian matrix of $\chi^2$
in the point $\vec{x}_k$ and the superscript $T$ stands for the transpose. The in the point $\vec{x}_k$ and the superscript $T$ stands for the transpose. The
initial problem is reduced into the so called trust-region subproblem (TRS), initial problem is reduced into the so called trust-region subproblem (TRS),
@ -281,7 +261,6 @@ $m_k(\vec{\delta})$ should be a good approximation of $\chi^2 (\vec{x}_k +
$$ $$
\|D_k \vec\delta\| < \Delta_k \|D_k \vec\delta\| < \Delta_k
$$ $$
If $D_k = I$, the region is a hypersphere of radius $\Delta_k$ centered at If $D_k = I$, the region is a hypersphere of radius $\Delta_k$ centered at
$\vec{x}_k$. In case one or more parameters have very different scales, the $\vec{x}_k$. In case one or more parameters have very different scales, the
region should be deformed according to a proper $D_k$. region should be deformed according to a proper $D_k$.
@ -289,9 +268,9 @@ region should be deformed according to a proper $D_k$.
Given an initial point $x_0$, the radius $\Delta_0$, the scale matrix $D_0$ Given an initial point $x_0$, the radius $\Delta_0$, the scale matrix $D_0$
and step $\vec\delta_0$, the LSQ algorithm consist in the following iteration: and step $\vec\delta_0$, the LSQ algorithm consist in the following iteration:
1. construct the function $m_k(\vec\delta)$, 1. construct the function $m_k(\vec\delta)$;
2. solve the TRS and find $x_k + \vec\delta_k$ which corresponds to the 2. solve the TRS and find $x_k + \vec\delta_k$ which corresponds to the
plausible minimum, plausible minimum;
3. check whether the solution actually decreases $\chi^2$: 3. check whether the solution actually decreases $\chi^2$:
1. if positive and converged (see below), stop; 1. if positive and converged (see below), stop;
1. if positive but not converged, it means that $m_k$ still well 1. if positive but not converged, it means that $m_k$ still well
@ -312,7 +291,6 @@ $$
\mu_k (\Delta_k - \|D_k \vec\delta_k\|) = 0 \et \mu_k (\Delta_k - \|D_k \vec\delta_k\|) = 0 \et
(H_k + \mu_k D_k^TD_k) \vec\delta_k = -\nabla_k (H_k + \mu_k D_k^TD_k) \vec\delta_k = -\nabla_k
$$ $$
Using the approximation[^2] $H\approx J^TJ$, obtained by computing the Hessian Using the approximation[^2] $H\approx J^TJ$, obtained by computing the Hessian
of the first-order Taylor expansion of $\chi^2$, $\vec\delta_k$ can of the first-order Taylor expansion of $\chi^2$, $\vec\delta_k$ can
be found by solving the system: be found by solving the system:
@ -322,7 +300,6 @@ $$
\sqrt{\mu_k} D_k \vec{\delta_k} = 0 \sqrt{\mu_k} D_k \vec{\delta_k} = 0
\end{cases} \end{cases}
$$ $$
For more informations, see [@Lou05]. For more informations, see [@Lou05].
[^2]: Here $J_{ij} = \partial f_i/\partial x_j$ is the Jacobian matrix of the [^2]: Here $J_{ij} = \partial f_i/\partial x_j$ is the Jacobian matrix of the
@ -346,7 +323,6 @@ $$
\bar{\beta_{\chi}} = 0.05972 \et \bar{\beta_{\chi}} = 0.05972 \et
\bar{\gamma_{\chi}} = -0.1736 \bar{\gamma_{\chi}} = -0.1736
$$ {#eq:lsq-res} $$ {#eq:lsq-res}
$$ $$
\text{cov}_{\chi} = \begin{pmatrix} \text{cov}_{\chi} = \begin{pmatrix}
9.244 \cdot 10^{-6} & -1.66 \cdot 10^{-6} & 2.383 \cdot 10^{-7} \\ 9.244 \cdot 10^{-6} & -1.66 \cdot 10^{-6} & 2.383 \cdot 10^{-7} \\
@ -354,29 +330,25 @@ $$
2.383 \cdot 10^{-7} & 1.505 \cdot 10^{-6} & 2.681 \cdot 10^{-6} 2.383 \cdot 10^{-7} & 1.505 \cdot 10^{-6} & 2.681 \cdot 10^{-6}
\end{pmatrix} \end{pmatrix}
$$ $$
Hence: Hence:
\begin{align*} \begin{align*}
&\alpha_{\chi} = 0.654 \pm 0.003 \\ &\alpha_{\chi} = 0.654 \pm 0.003 \\
&\beta_{\chi} = 0.0597 \pm 0.0021 \\ &\beta_{\chi} = 0.0597 \pm 0.0021 \\
&\gamma_{\chi} = -0.1736 \pm 0.0016 &\gamma_{\chi} = -0.1736 \pm 0.0016
\end{align*} \end{align*}
See @sec:res_comp for results compatibility. See @sec:res_comp for results compatibility.
### Results compatibility {#sec:res_comp} ### Results compatibility {#sec:res_comp}
In order to compare the values $x_L$ and $x_{\chi}$ obtained from both methods In order to compare the MLM and the LSQ parameters with the correct ones
with the correct ones ({$\alpha_0$, $\beta_0$, $\gamma_0$}), the following ({$\alpha_0$, $\beta_0$, $\gamma_0$}), the following compatibility $t$-test was
compatibility $t$-test was applied: applied:
$$ $$
p = 1 - \text{erf}\left(\frac{t}{\sqrt{2}}\right)\ \with p = 1 - \text{erf}\left(\frac{t}{\sqrt{2}}\right)\ \with
t = \frac{x_i - x_0}{\sqrt{\sigma_{x_i}^2 + \sigma_{x_0}^2}} t = \frac{x_i - x_0}{\sqrt{\sigma_{x_i}^2 + \sigma_{x_0}^2}}
= \frac{x_i - x_0}{\sigma_{x_i}} = \frac{x_i - x_0}{\sigma_{x_i}}
$$ $$
where $i$ stands either for the MLM or LSQ parameters and $\sigma_{x_i}$ is the where $i$ stands either for the MLM or LSQ parameters and $\sigma_{x_i}$ is the
uncertainty of the value $x_i$. At 95% confidence level, the values are uncertainty of the value $x_i$. At 95% confidence level, the values are
compatible if $p > 0.05$. compatible if $p > 0.05$.
@ -418,8 +390,8 @@ underestimated, which must be the case for $\gamma$.
Since two different methods similarly underestimated the true value of Since two different methods similarly underestimated the true value of
$\gamma$, it was suspected the Monte Carlo simulation was faulty. This $\gamma$, it was suspected the Monte Carlo simulation was faulty. This
phenomenon was observed frequently when generating multiple samples, so it phenomenon was observed frequently when generating multiple samples, though, so
can't be attributed to statistical fluctuations in that particular sample. it can't be attributed to statistical fluctuations in that particular sample.
The issue remains unsolved as no explanation was found. The issue remains unsolved as no explanation was found.