From a7d923620a4f1733485d7c11eb0c8e3dbb817ff0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gi=C3=B9=20Marcer?= <giuliamarcer@yahoo.it>
Date: Wed, 3 Jun 2020 10:02:34 +0200
Subject: [PATCH] ex-3: review

---
 notes/sections/3.md | 86 +++++++++++++++------------------------------
 1 file changed, 29 insertions(+), 57 deletions(-)

diff --git a/notes/sections/3.md b/notes/sections/3.md
index dae1426..150a8b0 100644
--- a/notes/sections/3.md
+++ b/notes/sections/3.md
@@ -2,42 +2,40 @@
 
 ## Meson decay events generation
 
-A number of $N = 50000$ points on the unit sphere, each representing a
-particle detection event, is to be generated according to then angular
-probability distribution function $F$:
+A number of $N = 50000$ points, each representing a particle detection event,
+is to be generated on the unit sphere according to the angular probability
+distribution function $F$:
 \begin{align*}
-  F (\theta, \phi) = &\frac{3}{4 \pi} \Bigg[ 
+  F (\theta, \phi) = &\frac{3}{4 \pi} \Bigg[
   \frac{1}{2} (1 - \alpha) + \frac{1}{2} (3\alpha - 1) \cos^2(\theta) \\
   &- \beta \sin^2(\theta) \cos(2 \phi)
   - \gamma \sqrt{2} \sin(2 \theta) \cos(\phi) \Bigg]
 \end{align*}
-
 where $\theta$ and $\phi$ are, respectively, the polar and azimuthal angles, and
 $$
   \alpha_0 = 0.65 \et \beta_0 = 0.06 \et \gamma_0 = -0.18
 $$
-
-To generate the points a *hit-miss* method was employed:
+To generate the points, a *hit-miss* method was employed:
 
 1. generate a point $(\theta , \phi)$ uniformly on a unit sphere,
 2. generate a third value $Y$ uniformly in $[0, 1]$,
-3. if @eq:requirement is satisfied save the point,
+3. if @eq:requirement is satisfied, save the point,
+   $$
+     Y < F(\theta, \phi)
+   $$ {#eq:requirement}
 4. repeat from 1.
-$$
-  Y < F(\theta, \phi)
-$$ {#eq:requirement}
 
 To increase the efficiency of the procedure, the $Y$ values were actually
 generated in $[0, 0.2]$, since $\max F = 0.179$ for the given parameters
 (by generating the numbers in the range $[0 ; 1]$, all the points with
 $Y_{\theta, \phi} > 0.179$ would have been rejected with the same
-probability, namely 1).  
+probability, namely 1).
 
 While $\phi$ can simply be generated uniformly between 0 and $2 \pi$, for
 $\theta$ one has to be more careful: by generating evenly distributed numbers
 between 0 and $\pi$, the points turn out to cluster on the poles. The point
 $(\theta,\phi)$ is *not* uniform on the sphere but is uniform on a cylindrical
-projection of it. (The difference can be appreciated in @fig:compare).
+projection of it (the difference can be appreciated in @fig:compare).
 
 The proper distribution of $\theta$ is obtained by applying a transform
 to a variable $x$ with uniform distribution in $[0, 1)$,
@@ -73,13 +71,12 @@ The transformation can be found by imposing the angular PDF to be a constant:
   \theta = \theta (x)
   \hspace{20pt} &\Longrightarrow \hspace{20pt}
   \frac{dP}{d \theta} = \frac{dP}{dx} \cdot \left| \frac{dx}{d \theta} \right|
-  = \left. \frac{dP}{dx} \middle/ \, \left| \frac{d \theta}{dx} \right| \right. 
+  = \left. \frac{dP}{dx} \middle/ \, \left| \frac{d \theta}{dx} \right| \right.
   \\
   &\Longrightarrow \hspace{20pt}
   \frac{1}{2} \sin(\theta) = \left. 1 \middle/ \, \left|
-  \frac{d \theta}{dx} \right| \right. 
+  \frac{d \theta}{dx} \right| \right.
 \end{align*}
-
 If $\theta$ is chosen to grew together with $x$, then the absolute value can be
 omitted:
 \begin{align*}
@@ -96,12 +93,9 @@ omitted:
   &\Longrightarrow \hspace{20pt}
   \theta = \text{acos} (1 -2x)
 \end{align*}
-
 Since 1 is not in $[0 , 1)$, $\theta = \pi$ can't be generated. Being it just
 a single point, the effect of this omission is negligible.
 
-\vspace{30pt}
-
 
 ## Parameters estimation
 
@@ -119,7 +113,6 @@ defined in @eq:Like0, where the index $i$ runs over the sample:
 $$
   L = \prod_i F(\theta_i, \phi_i) = \prod_i F_i
 $$ {#eq:Like0}
-
 The reason is that the points were generated according to the probability
 distribution function $F$ with parameters $\alpha_0$, $\beta_0$ and $\gamma_0$,
 thus the probability $P$ of sampling this special sample is, by definition, the
@@ -133,12 +126,10 @@ and the logarithm simplifies the math:
 $$
   L = \prod_i F_i \thus - \ln(L) = - \sum_{i} \ln(F_i)
 $$ {#eq:Like}
-
-
 The problem of multidimensional minimisation was tackled using the GSL library
 `gsl_multimin.h`, which implements a variety of methods, including the
 conjugate gradient Fletcher-Reeves algorithm, used in the solution, which requires
-the implementation of $-\ln(L)$ and its gradient.
+the implementation of $-\ln(L)$ and its gradient. It works as follows.
 
 To minimise a function $f$, given an initial guess point $x_0$, the gradient
 $\nabla f$ is used to find the initial direction $v_0$ (the steepest descent)
@@ -147,7 +138,6 @@ the directional derivative along $v_0$ is zero:
 $$
   \frac{\partial f}{\partial v_0} = \nabla f \cdot v_0 = 0
 $$
-
 or, equivalently, at the point where the gradient is orthogonal to $v_0$.
 After this first step, the following procedure is iterated:
 
@@ -162,13 +152,12 @@ given by the Fletcher-Reeves formula:
 $$
   \beta_n = \frac{\|\nabla f(x_n)\|^2}{\|\nabla f(x_{n-1})\|^2}
 $$
-
 The accuracy of each line minimisation is controlled with the parameter `tol`,
 meaning:
 $$
   w \cdot \nabla f < \text{tol} \, \|w\| \, \|\nabla f\|
 $$
-
+which was set to its default value 0.1.  
 The minimisation is quite unstable and this forced the starting point to be
 taken very close to the solution, namely:
 $$
@@ -176,8 +165,7 @@ $$
   \beta_{sp} = 0.02 \et
   \gamma_{sp} = -0.17
 $$
-
-The overall minimisation ends when the gradient module is smaller than $10^{-4}$.
+The overall minimisation ends when the gradient module is smaller than $10^{-5}$.
 The program took 25 of the above iterations to reach the result shown in @eq:Like_res.
 
 As regards the error estimation, the Cramér-Rao bound states that the covariance
@@ -189,13 +177,11 @@ product of a lower triangular matrix $\Delta$ and its transpose $\Delta^T$:
 $$
   H = \Delta \cdot \Delta^T
 $$
-
 The Hessian matrix is, indeed, both symmetric and positive definite, when
 computed in a minimum. Once decomposed, the inverse is given by
 $$
   H^{-1} = (\Delta \cdot \Delta^T)^{-1} = (\Delta^{-1})^T \cdot \Delta^{-1}
 $$
-
 where the inverse of $\Delta$ is easily computed, since it's a triangular
 matrix.
 
@@ -207,7 +193,6 @@ $$
   \bar{\beta_L}  = 0.06125 \et
   \bar{\gamma_L} = -0.1763
 $$ {#eq:Like_res}
-
 $$
 \text{cov}_L = \begin{pmatrix}
  9.225 \cdot 10^{-6} & -1.785 \cdot 10^{-6} & 2.41  \cdot 10^{-7} \\
@@ -215,14 +200,12 @@ $$
   2.41 \cdot 10^{-7} &  1.58  \cdot 10^{-6} & 2.602 \cdot 10^{-6}
 \end{pmatrix}
 $$ {#eq:Like_cov}
-
 Hence:
 \begin{align*}
   \alpha_L &=  0.654  \pm 0.003  \\
   \beta_L  &=  0.0613 \pm 0.0021 \\
   \gamma_L &= -0.1763 \pm 0.0016
 \end{align*}
-
 See @sec:res_comp for results compatibility.
 
 
@@ -241,19 +224,16 @@ five): in this case, 30 bins for $\theta$ and 60 for $\phi$ turned out
 to be satisfactory. The expected values $E_{\theta, \phi}$ were given as the
 product of $N$, $F$ computed in the geometric center of the bin and the solid
 angle enclosed by the bin itself, namely:
-
 $$
   E_{\theta, \phi} = N F(\theta, \phi) \, \Delta \theta \, \Delta \phi \,
   \sin{\theta}
 $$
-
 Once the data are binned, the number of events in each bin plays the role of the
 observed value. Then, the $\chi^2$, defined as follow, must be minimized with
 respect to the three parameters to be estimated:
 $$
   \chi^2 = \sum_i f_i^2 = \|\vec f\|^2 \with f_i = \frac{O_i - E_i}{\sqrt{E_i}}
 $$
-
 where the sum runs over the bins, $O_i$ are the observed values and $E_i$ the
 expected ones. Besides the more generic `gsl_multimin.h` library, GSL provides a
 special one for least square minimisation, called `gsl_multifit_nlinear.h`. A
@@ -261,7 +241,8 @@ trust region method was used to minimize the $\chi^2$.
 In such methods, the $\chi^2$, its gradient and its Hessian matrix are computed
 in a series of points in the parameters space till the gradient and the matrix
 reach given proper values, meaning that the minimum has been well approached,
-where "well" is related to those values.  
+where "well" is related to those values. They work as follows.
+
 If {$x_1 \dots x_p$} are the parameters which must be estimated, first $\chi^2$
 is computed in a point $\vec{x_k}$, then its value in a point $\vec{x}_k +
 \vec{\delta}$ is modelled by a function $m_k (\vec{\delta})$ which is the
@@ -271,7 +252,6 @@ $$
   \chi^2 (\vec{x}_k) + \nabla_k^T  \vec{\delta} + \frac{1}{2}
   \vec{\delta}^T H_k \vec{\delta}
 $$
-
 where $\nabla_k$ and $H_k$ are the gradient and the Hessian matrix of $\chi^2$
 in the point $\vec{x}_k$ and the superscript $T$ stands for the transpose. The
 initial problem is reduced into the so called trust-region subproblem (TRS),
@@ -281,7 +261,6 @@ $m_k(\vec{\delta})$ should be a good approximation of $\chi^2 (\vec{x}_k +
 $$
   \|D_k \vec\delta\| < \Delta_k
 $$
-
 If $D_k = I$, the region is a hypersphere of radius $\Delta_k$ centered at
 $\vec{x}_k$. In case one or more parameters have very different scales, the
 region should be deformed according to a proper $D_k$.
@@ -289,9 +268,9 @@ region should be deformed according to a proper $D_k$.
 Given an initial point $x_0$, the radius $\Delta_0$, the scale matrix $D_0$
 and step $\vec\delta_0$, the LSQ algorithm consist in the following iteration:
 
-  1. construct the function $m_k(\vec\delta)$,
+  1. construct the function $m_k(\vec\delta)$;
   2. solve the TRS and find $x_k + \vec\delta_k$ which corresponds to the
-     plausible minimum,
+     plausible minimum;
   3. check whether the solution actually decreases $\chi^2$:
       1. if positive and converged (see below), stop;
       1. if positive but not converged, it means that $m_k$ still well
@@ -312,7 +291,6 @@ $$
    \mu_k (\Delta_k  - \|D_k \vec\delta_k\|) = 0 \et
    (H_k + \mu_k D_k^TD_k) \vec\delta_k = -\nabla_k
 $$
-
 Using the approximation[^2] $H\approx J^TJ$, obtained by computing the Hessian
 of the first-order Taylor expansion of $\chi^2$, $\vec\delta_k$ can
 be found by solving the system:
@@ -322,7 +300,6 @@ $$
   \sqrt{\mu_k} D_k \vec{\delta_k} = 0
 \end{cases}
 $$
-
 For more informations, see [@Lou05].
 
 [^2]: Here $J_{ij} = \partial f_i/\partial x_j$ is the Jacobian matrix of the
@@ -346,7 +323,6 @@ $$
   \bar{\beta_{\chi}} = 0.05972 \et
   \bar{\gamma_{\chi}} = -0.1736
 $$ {#eq:lsq-res}
-
 $$
 \text{cov}_{\chi} = \begin{pmatrix}
  9.244 \cdot 10^{-6} & -1.66  \cdot 10^{-6} & 2.383 \cdot 10^{-7} \\
@@ -354,29 +330,25 @@ $$
  2.383 \cdot 10^{-7} &  1.505 \cdot 10^{-6} & 2.681 \cdot 10^{-6}
 \end{pmatrix}
 $$
-
 Hence:
 \begin{align*}
   &\alpha_{\chi} = 0.654   \pm 0.003   \\
   &\beta_{\chi}  = 0.0597  \pm 0.0021  \\
   &\gamma_{\chi} = -0.1736 \pm 0.0016
 \end{align*}
-
 See @sec:res_comp for results compatibility.
 
 
 ### Results compatibility {#sec:res_comp}
 
-In order to compare the values $x_L$ and $x_{\chi}$ obtained from both methods
-with the correct ones ({$\alpha_0$, $\beta_0$, $\gamma_0$}), the following
-compatibility $t$-test was applied:
-
+In order to compare the MLM and the LSQ parameters with the correct ones
+({$\alpha_0$, $\beta_0$, $\gamma_0$}), the following compatibility $t$-test was
+applied:
 $$
   p = 1 - \text{erf}\left(\frac{t}{\sqrt{2}}\right)\ \with
   t = \frac{x_i - x_0}{\sqrt{\sigma_{x_i}^2 + \sigma_{x_0}^2}}
   = \frac{x_i - x_0}{\sigma_{x_i}}
 $$
-
 where $i$ stands either for the MLM or LSQ parameters and $\sigma_{x_i}$ is the
 uncertainty of the value $x_i$. At 95% confidence level, the values are
 compatible if $p > 0.05$.
@@ -389,9 +361,9 @@ Likelihood results:
  par         $p$-value
 ------------ ---------------
  $α_L$        0.18
- 
+
  $β_L$        0.55
- 
+
  $γ_L$        0.023
 ----------------------------
 
@@ -403,9 +375,9 @@ $\chi^2$ results:
  par         $p$-value
 ------------ ---------------
  $α_χ$        0.22
- 
+
  $β_χ$        0.89
- 
+
  $γ_χ$        0.0001
 ----------------------------
 
@@ -418,8 +390,8 @@ underestimated, which must be the case for $\gamma$.
 
 Since two different methods similarly underestimated the true value of
 $\gamma$, it was suspected the Monte Carlo simulation was faulty. This
-phenomenon was observed frequently when generating multiple samples, so it
-can't be attributed to statistical fluctuations in that particular sample.
+phenomenon was observed frequently when generating multiple samples, though, so
+it can't be attributed to statistical fluctuations in that particular sample.
 The issue remains unsolved as no explanation was found.