ex-2: revised and typo-fixed

2020-05-08 23:07:37 +02:00 · 2020-05-08 23:07:37 +02:00 · c6939be6d0
commit c6939be6d0
parent 1bd691a24b
4 changed files with 135 additions and 57 deletions
--- a/ex-2/fancier.c
+++ b/ex-2/fancier.c
@ -173,7 +173,7 @@ void mpq_log2(mpq_t rop, size_t digits) {
 * where 1<a<2 and n is the number of binary
 * digits of x.
 *
- * log(a) is the computed from
+ * log(a) is computed from
 *
 *   log((1+y)/(1-y)) = 2Σ_{k=0} y^(2k+1)/(2k+1)
 *
--- a/notes/docs/bibliography.bib
+++ b/notes/docs/bibliography.bib
@ -20,3 +20,72 @@
  year={2017},
  publisher={Multidisciplinary Digital Publishing Institute}
 }
+
+@article{marsaglia03,
+  title={Evaluating Kolmogorov’s distribution},
+  author={Marsaglia, George and Tsang, Wai Wan and Wang, Jingbo and others},
+  journal={Journal of Statistical Software},
+  volume={8},
+  number={18},
+  pages={1--4},
+  year={2003}
+}
+
+@article{robertson74,
+  title={An iterative procedure for estimating the mode},
+  author={Robertson, Tim and Cryer, Jonathan D},
+  journal={Journal of the American Statistical Association},
+  volume={69},
+  number={348},
+  pages={1012--1016},
+  year={1974},
+  publisher={Taylor \& Francis Group}
+}
+
+@misc{GSL,
+  title={Gnu Scientific Library Reference Manual (3rd Edition)},
+  author={M. Galassi et al},
+  ISBN={0954612078},
+  url={http://www.gnu.org/software/gsl/}
+}
+
+@book{silverman86,
+  title={Density estimation for statistics and data analysis},
+  author={Silverman, Bernard W},
+  volume={26},
+  year={1986},
+  publisher={CRC press}
+}
+
+@book{davis59,
+  title={Leonhard Euler's Integral: A Historical Profile of the Gamma Function},
+  author={Davis, P. J.},
+  year={1959},
+  journal={American Mathematical Monthly},
+  pages={849 -– 869},
+  doi={10.2307/2309786}
+}
+
+@book{bak91,
+  title={Complex analysis},
+  author={Bak, Joseph and Newman, Donald J and Newman, Donald J},
+  year={1991},
+  publisher={Springer},
+  pages={265 -- 268}
+}
+
+@misc{yee19,
+  title={Formulas and Algorithms},
+  author={Alexander Yee},
+  year={2019},
+  journal={Alex Yee Website},
+  url={http://www.numberworld.org/y-cruncher/internals/formulas.html}
+}
+
+@article{riddle08,
+  title={Approximating the Sum of a Convergent Series},
+  author={Riddle, Larry},
+  journal={AP{\textregistered} Calculus},
+  year={2008},
+  publisher={Citeseer}
+}
--- a/notes/sections/1.md
+++ b/notes/sections/1.md
@ -1,5 +1,6 @@
 # Exercise 1 {#sec:Landau}

+
 ## Random numbers following the Landau distribution

 The Landau distribution is a probability density function which can be defined
@ -24,6 +25,7 @@ function and plotted in a 100-bins histogram ranging from -20 to

 ## Randomness testing of the generated sample

+
 ### Kolmogorov-Smirnov test

 In order to compare the sample with the Landau distribution, the
--- a/notes/sections/2.md
+++ b/notes/sections/2.md
@ -1,10 +1,10 @@
 # Exercise 2

+
 ## Euler-Mascheroni constant

 The Euler-Mascheroni constant is defined as the limiting difference between the
 partial sums of the harmonic series and the natural logarithm:
-
 $$
  \gamma = \lim_{n \rightarrow +\infty} \left( \sum_{k=1}^{n} \frac{1}{k}
  - \ln(n) \right)
@ -12,7 +12,6 @@ $$ {#eq:gamma}

 and represents the limiting blue area in @fig:gamma. The first 30 digits of
 $\gamma$ are:
-
 $$
  \gamma = 0.57721\ 56649\ 01532\ 86060\ 65120\ 90082 \dots
 $$ {#eq:exact}
@ -29,16 +28,17 @@ efficiency of the methods lies on how quickly they converge to their limit.
 ![The area of the blue region converges to the Euler–Mascheroni
 constant.](images/gamma-area.png){#fig:gamma width=7cm}

+
 ## Computing the constant

+
 ### Definition

-First, in order to have a quantitative idea of how hard it is to reach a good
+First, in order to give a quantitative idea of how hard it is to reach a good
 estimation of $\gamma$, it was naively computed using the definition given in
@eq:gamma. The difference was computed for increasing value of $n$, with
 $n_{i+1} = 10 \cdot n_i$ and $n_1 = 20$, till the approximation starts getting
 worse, namely:
-
 $$
  | \gamma(n_{i+1}) - \gamma | > | \gamma(n_i) - \gamma|
 $$
@ -74,14 +74,14 @@ n           sum           $|\gamma(n)-\gamma|$
 Table: Partial results using the definition of $\gamma$ with double
 precision. {#tbl:1_results}

-The convergence is logarithmic: to fix the first $d$ decimal places about
-$10^d$  terms are needed. The double precision runs out at the
+The convergence is logarithmic: to fix the first $d$ decimal places, about
+$10^d$ terms are needed. The double precision runs out at the
 10\textsuperscript{th} place, $n=\SI{2e10}{}$.  
 Since all the number are given with double precision, there can be at best 15
 correct digits but only 10 were correctly computed: this means that when the
 terms of the series start being smaller than the smallest representable double,
-the sum of all the remaining terms give a number $\propto 10^{-11}$.
-
+the sum of all the remaining terms give a number $\propto 10^{-11}$.  
+Best result in @tbl:first.

 --------- -----------------------
 true:	     0.57721\ 56649\ 01533
@ -91,13 +91,14 @@ approx:    0.57721\ 56648\ 77325
 diff:	     0.00000\ 00000\ 24207
 --------- -----------------------

-Table: First method results. {#tbl:first}
+Table: First method best result. From the top down: true value, best estimation
+and difference between them. {#tbl:first}
+

 ### Alternative formula

 As a first alternative, the constant was computed through the identity which
-relates $\gamma$ to the $\Gamma$ function as follow:
-
+relates $\gamma$ to the $\Gamma$ function as follow [@davis59]:
 $$
  \gamma = \lim_{M \rightarrow + \infty} \sum_{k  = 1}^{M}
  \binom{M}{k} \frac{(-1)^k}{k} \ln(\Gamma(k + 1))
@ -105,7 +106,7 @@ $$

 Varying $M$ from 1 to 100, the best result was obtained for $M = 41$ (see
@tbl:second). It went sour: the convergence is worse than using the definition
-itself. Only two places were correctly computed.
+itself. Only two places were correctly computed (#@tbl:second).

 --------- -----------------------
 true:	     0.57721\ 56649\ 01533
@ -115,38 +116,35 @@ approx:    0.57225\ 72410\ 34058
 diff:	     0.00495\ 84238\ 67473
 --------- -----------------------

-Table: Second method results. {#tbl:second}
+Table: Best esitimation of $\gamma$ using
+the alternative formula. {#tbl:second}

 Here, the problem lies in the binomial term: computing the factorial of a
 number greater than 18 goes over 15 places and so cannot be correctly
-represented. Furthermore, the convergence (even if this is not a series
-and, consequently, it is not properly a "convergence") is slowed down by
-the logarithmic factor.
+represented. Furthermore, the convergence is slowed down by the logarithmic
+factor.
+

 ### Reciprocal $\Gamma$ function

 A better result was found using the well known reciprocal $\Gamma$ function
-formula:
-
+formula [@bak91]:
 $$
  \frac{1}{\Gamma(z)} = z e^{yz} \prod_{k = 1}^{+ \infty}
  \left( 1 + \frac{z}{k} \right) e^{-z/k}
 $$

 which gives:
-
 $$
  \gamma = - \frac{1}{z} \ln \left( z \Gamma(z)
  \prod_{k = 1}^{+ \infty} \left( 1 + \frac{z}{k} \right) e^{-z/k} \right)
 $$

 The execution stops when there is no difference between two consecutive therms
-of the infinite product (it happens for $k = 456565794$, meaning that for this
-value of $k$, the term of the product is equal to 1). Different values of $z$
-were checked, with $z_{i+1} = z_i + 0.01$, ranging from 0 to 20 and the best
-result was found for $z = 9$. As can be seen in @tbl:3_results, it's only by
-chance, since all $|\gamma(z) - \gamma |$ are of the same order of magnitude.
-The best one is compared with the exact value of $\gamma$ in @tbl:third.
+of the infinite product (it happens for $k = 456565794 \sim \SI{4.6e8}{}$,
+meaning that for this value of $k$ the term of the product is equal to 1 in
+terms of floating points). Different values of $z$ were checked, with $z_{i+1}
+= z_i + 0.01$ ranging from 0 to 20, and the best result was found for $z = 9$.

 ---------------------------------------------------------------
    z  $|\gamma(z) - \gamma |$      z   $|\gamma(z) - \gamma |$
@ -172,8 +170,15 @@ The best one is compared with the exact value of $\gamma$ in @tbl:third.
 19    \SI{10.084e-9}{}         9.04    \SI{9.419e-9}{}
 ---------------------------------------------------------------

-Table: Differences between the obtained values of $\gamma$ and the exact
-one. {#tbl:3_results}
+Table: Differences between some obtained values of $\gamma$ and
+the exact one found with the reciprocal $\Gamma$ function formula.
+The values on the left are shown to give an idea of the $z$
+large-scale behaviour; on the right, the values around the best
+one ($z = 9.00$) are listed. {#tbl:3_results}
+
+As can be seen in @tbl:3_results, the best value for $z$ is only by chance,
+since all $|\gamma(z) - \gamma |$ are of the same order of magnitude. The best
+one is compared with the exact value of $\gamma$ in @tbl:third.

 --------- -----------------------
 true:	     0.57721\ 56649\ 01533
@ -188,17 +193,15 @@ Table: Third method results for z = 9.00. {#tbl:third}
 This time, the convergence of the infinite product is fast enough to ensure the
 $8^{th}$ place.

+
 ### Fastest convergence formula

-The fastest known convergence belongs to the following formula:  
-(source: http://www.numberworld.org/y-cruncher/internals/formulas.html):
-
+The fastest known convergence belongs to the following formula [@yee19]:  
 $$
  \gamma = \frac{A(N)}{B(N)} -\frac{C(N)}{B^2(N)} - \ln(N)
 $$ {#eq:faster}

 with:
-
 \begin{align*}
  &A(N) = \sum_{k=1}^{+ \infty} \frac{N^k}{k!} \cdot H(k)
          \with H(k) = \sum_{j=1}^{k} \frac{1}{j}                 \\
@ -207,11 +210,15 @@ with:
          \frac{((2k)!)^3}{(k!)^4 \cdot (16k)^2k}         \\
 \end{align*}

-The series $A$ and $B$ are computed till there is no difference between two
-consecutive terms.  
-The number of desired correct decimals is given in input and $N$ is
-consequently computed through a formula given in the same article
-above-mentioned. Results are shown in @tbl:fourth.
+The series $A$ and $B$ were computed till there is no difference between two
+consecutive terms. The number of desired correct decimals $D$ was given in
+input and $N$ was consequently computed through the formula: 
+$$
+  N = \text{floor} \left( 2 + \frac{1}{4} \cdot \ln(10) \cdot D \right)
+$$
+
+given in [@yee19], where floor returns the highest integer smaller than its
+argument. Results are shown in @tbl:fourth.

 --------- ------------------------------
 true:	     0.57721\ 56649\ 01532\ 75452
@ -221,11 +228,12 @@ approx:    0.57721\ 56649\ 01532\ 86554
 diff:	     0.00000\ 00000\ 00000\ 11102
 --------- ------------------------------

-Table: Fourth method results. {#tbl:fourth}
+Table: $\gamma$ estimation with the fastest
+known convergence formula (@eq:faster). {#tbl:fourth}
+
+Due to roundoff errors, the best results was obtained for $N = 10$. Up to 15
+places were correctly computed.

-Due to roundoff errors, the best results is obtained for $N = 10$. Since up to
-15 places were correctly computed, an approximation of $\gamma$ better than the
-one reached with the definition in @eq:gamma was obtained.

 ### Arbitrary precision

@ -240,18 +248,18 @@ to compute an operation based on the size of the operands.

 The terms in @eq:faster can therefore be computed with arbitrarily large
 precision.  Thus, a program that computes the Euler-Mascheroni constant within
-a user controllable precision has been implemented. Unlike the previously
-mentioned programs, this one was more carefully optimized.
+a user controllable precision was implemented. Unlike the previously mentioned
+programs, this one was more carefully optimized.

 The $A$ and $B$ series are computed up to an arbitrary limit $k_{\text{max}}$.
 Different values of $k_{\text{max}}$ were tested but, obviously, they all
 eventually reach a point where the approximation cannot guarantee the
-requested precision; the solution turned out to be to let $k_{\text{max}}$ depends on
-$N$.  Consequently, $k_{\text{max}}$ was chosen to be $5N$, after it has been
-verified to produce the correct digits up to 500 decimal places.
+requested precision; the solution turned out to be to let $k_{\text{max}}$
+depends on $N$. Consequently, $k_{\text{max}}$ was chosen to be $5N$, after it
+was verified to produce the correct digits up to 500 decimal places.

 The GMP library offers functions to perform some operations such as addition,
-multiplication, division, etc. however, the logarithm function is not
+multiplication, division, etc. However, the logarithm function is not
 implemented. Thus, most of the code carries out the $\ln(N)$ computation.
 First, it should be noted that the logarithm of only some special numbers can
 be computed with arbitrary precision, namely the ones of which a converging
@ -261,13 +269,14 @@ $$
  N = N_0 \cdot b^e \thus \ln(N) = \ln(N_0) + e \cdot \ln(b)
 $$

-Since a fast converging series for $\ln(2)$ is known $b = 2$ was chosen.  As
+Since a fast converging series for $\ln(2)$ is known, $b = 2$ was chosen. As
 well as for the scientific notation, in order to get the mantissa $1 \leqslant
 N_0 < 2$, the number of binary digits of $N$ must be computed (conveniently, a
-dedicated function `mpz_sizeinbase()` can be found in GMP). If the digits are $n$:
+dedicated function `mpz_sizeinbase()` can be found in GMP). If the digits are
+$n$:

 $$
-  e = b - 1 \thus N_0 = \frac{N}{2^{n - 1}}
+  e = n - 1 \thus N_0 = \frac{N}{2^{n - 1}}
 $$

 Then, by defining:
@ -276,8 +285,7 @@ $$
  N_0 = \frac{1 + y}{1 - y} \thus y = \frac{N_0 - 1}{N_0 + 1} < 1
 $$

-and the following series (which is convergent for $y < 1$) can therefore be
-used:
+the following series, which is convergent for $y < 1$, can therefore be used:

 $$
  \ln \left( \frac{1 + y}{1 - y} \right) =
@ -286,12 +294,11 @@ $$

 But when to stop computing the series?  
 Given a partial sum $S_k$ of the series, it is possible to know when a digit is
-definitely correct, meaning that no matter how large $k$ can be,
-it will not affect that decimal place. The key lies in the following concept.
-Letting $S$ the value of the series:
+definitely correct. The key lies in the following concept. Letting $S$ the value
+of the series [@riddle08]:

 $$
-  S_k + \frac{a_{k+1}}{1 - \frac{a_{k+1}}{a_k}} < S < S_k + a_k \frac{L}{1 -L}
+  S_k + a_k \frac{L}{1 -L} < S < S_k + \frac{a_{k+1}}{1 - \frac{a_{k+1}}{a_k}}
 $$

 where $L$ is the limiting ratio of the series terms, which must be $< 1$ in
@ -308,4 +315,4 @@ $$
  \log(2) = \sum_{k=1}^{+ \infty} \frac{1}{k \cdot 2^k}
 $$

-In this case the ratio is $L = 1/2$.  
+In this case the ratio is $L = 1/2$.