5.3 KiB
Statistical analysis
Description
This repository is structured as follows:
-
lectures
: a summary of the lectures of the course -
notes
: an explanation of the solutions of the exercises
ex-n
: programs written for each exercise
Building the documents
The two documents excercise.pdf
and lectures.pdf
are written in Pandoc
markdown. XeTeX (with some standard LaTeX packages), the
pandoc-crossref filter and a
Make program are required to build. Simply typing make
in the respective
directory will build the document, provided the above dependencies are met.
Building the programs
The programs used to solve the exercise are written in standard C99 (with the
only exception of the #pragma once
clause) and require the following
libraries to build:
- pkg-config (build-time only)
Additionally Python (version 3) with numpy
and matplotlib
is required to
generate plots.
For convenience a shell.nix
file is provided to set up the build environment.
See this guide if you have
never used Nix before. Running nix-shell
in the top-level will drop you into
the development shell.
Once ready, invoke make
with the program you wishes to build. For example
$ make ex-1/bin/main
or, to build every program of an exercise
$ make ex-1
To clean up the build results run
$ make clean
Running the programs
Notes:
- Many programs generate random numbers using a PRNG that is seeded with a
fixed value, for reproducibility. It's possible to test the program on
different samples by changing the seed via the environment variable
GSL_RNG_SEED
.
Exercise 1
ex-1/bin/main
generate random numbers following the Landau distribution and
run a series of test to check if they really belong to such a distribution.
The size of the sample can be controlled with the argument -n N
.
The program outputs the result of a Kolmogorov-Smirnov test and t-tests
comparing the sample mode, FWHM and median, in this order.
ex-1/bin.pdf
prints a list of x-y points of the Landau PDF to the stdout
.
The output can be redirected to ex-1/pdf-plot.py
to generate a plot.
Exercise 2
Every program in ex-2
computes the best available approximation (with a given
method) to the Euler-Mascheroni γ constant and prints[1]:
-
the leading decimal digits of the approximate value found
-
the exact decimal digits of γ
-
the absolute difference between the 1. and 2.
[1]: Some program may also print additional debugging information.
ex-2/bin/fancy
, ex-2/bin/fancier
can compute γ to a variable precision and
take therefore the required number of decimal places as their only argument.
The exact γ digits (used in comparison) are limited to 50 and 500 places,
respectively.
Exercise 3
ex-3/bin/main
generates a sample of particle decay events and attempts to
recover the distribution parameters via both a MLE and a χ² method. In both
cases the best fit and the parameter covariance matrix are printed.
The program then performs a t-test to assert the compatibility of the data with
two hypothesis and print the results in a table.
To plot a 2D histogram of the generated sample do
$ ex-3/bin/main -i | ex-3/plot.py
In addition the program accepts a few more parameters to control the histogram
and number of events, run it with -h
to see their usage.
Note: the histogram parameters affect the computation of the χ² and the relative parameter estimation.
Exercise 6
ex-6/bin/main
simulates a Fraunhöfer diffraction experiment. The program
prints to stdout
the bin counts of the intensity as a function of the
diffraction angle. To plot a histogram simply pipe the output to the
program ex-6/plot.py
.
The program convolves the original signal with a gaussian kernel (-s
to
change the σ), optionally adds a Poisson noise (-m
to change the mean μ) and
performs either a naive deconvolution by a FFT (-m fft
mode) or applying the
Richard-Lucy deconvolution algorithm (-m rl
mode), which is expected to
perform optimally in this case.
The -c
and -d
options controls whether the convolved or deconvolved
histogram counts should be printed to stdout
. For more options
run the program with -h
to see the usage screen.
Exercise 7
ex-7/bin/main
generates a sample with two classes of 2D points (signal,
noise) and trains either a Fisher linear discriminant or a single perceptron to
classify them (-m
argument to change mode). Alternatively the weights can be
set manually via the -w
argument. In either case the program then prints the
classified data in this order: signal then noise.
To plot the result of the linear classification pipe the output to
ex-7/plot.py
. The program generates two figures:
- a scatter plot showing the Fisher projection line and the cut line
- two histograms of the projected data and the cut line
ex-7/bin/test
takes a model trained in ex-7/bin/main
and test it against
newly generated datasets (-i
to set the number of test iterations). The
program prints the statistics of the number of false positives, false
negatives and finally the purity and efficiency of the classification.