# Statistical analysis ## Description This repository is structured as follows: - `lectures`: a summary of the lectures of the course - `notes`: an explanation of the solutions of the exercises * `ex-n`: programs written for each exercise ## Building the documents The two documents `excercise.pdf` and `lectures.pdf` are written in Pandoc markdown. XeTeX (with some standard LaTeX packages), the [pandoc-crossref](https://github.com/lierdakil/pandoc-crossref) filter and a Make program are required to build. Simply typing `make` in the respective directory will build the document, provided the above dependencies are met. ## Building the programs The programs used to solve the exercise are written in standard C99 (with the only exception of the `#pragma once` clause) and require the following libraries to build: - [GMP](https://gmplib.org/) - [GSL](https://www.gnu.org/software/gsl/) * [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/) (build-time only) Additionally Python (version 3) with `numpy` and `matplotlib` is required to generate plots. For convenience a `shell.nix` file is provided to set up the build environment. See this [guide](https://nixos.org/nix/manual/#chap-quick-start) if you have never used Nix before. Running `nix-shell` in the top-level will drop you into the development shell. Once ready, invoke `make` with the program you wishes to build. For example $ make ex-1/bin/main or, to build every program of an exercise $ make ex-1 To clean up the build results run $ make clean ## Running the programs Notes: - Many programs generate random numbers using a PRNG that is seeded with a fixed value, for reproducibility. It's possible to test the program on different samples by changing the seed via the environment variable `GSL_RNG_SEED`. ### Exercise 1 `ex-1/bin/main` generate random numbers following the Landau distribution and run a series of test to check if they really belong to such a distribution. The size of the sample can be controlled with the argument `-n N`. The program outputs the result of a Kolmogorov-Smirnov test and t-tests comparing the sample mode, FWHM and median, in this order. `ex-1/bin.pdf` prints a list of x-y points of the Landau PDF to the `stdout`. The output can be redirected to `ex-1/pdf-plot.py` to generate a plot. ### Exercise 2 Every program in `ex-2` computes the best available approximation (with a given method) to the Euler-Mascheroni γ constant and prints[1]: 1. the leading decimal digits of the approximate value found 2. the exact decimal digits of γ 3. the absolute difference between the 1. and 2. [1]: Some program may also print additional debugging information. `ex-2/bin/fancy`, `ex-2/bin/fancier` can compute γ to a variable precision and take therefore the required number of decimal places as their only argument. The exact γ digits (used in comparison) are limited to 50 and 500 places, respectively. ### Exercise 3 `ex-3/bin/main` generates a sample of particle decay events and attempts to recover the distribution parameters via both a MLE and a χ² method. In both cases the best fit and the parameter covariance matrix are printed. The program then performs a t-test to assert the compatibility of the data with two hypothesis and print the results in a table. To plot a 2D histogram of the generated sample do $ ex-3/bin/main -i | ex-3/plot.py In addition the program accepts a few more parameters to control the histogram and number of events, run it with `-h` to see their usage. Note: the histogram parameters affect the computation of the χ² and the relative parameter estimation. ### Exercise 6 `ex-6/bin/main` simulates a Fraunhöfer diffraction experiment. The program prints to `stdout` the bin counts of the intensity as a function of the diffraction angle. To plot a histogram simply pipe the output to the program `ex-6/plot.py`. The program convolves the original signal with a gaussian kernel (`-s` to change the σ), optionally adds a Poisson noise (`-m` to change the mean μ) and performs either a naive deconvolution by a FFT (`-m fft` mode) or applying the Richard-Lucy deconvolution algorithm (`-m rl` mode), which is expected to perform optimally in this case. The `-c` and `-d` options controls whether the convolved or deconvolved histogram counts should be printed to `stdout`. For more options run the program with `-h` to see the usage screen.