ex-7: went on writing the FLD

This commit is contained in:
Giù Marcer 2020-04-01 23:39:19 +02:00 committed by rnhmjoj
parent 3ee0aec13e
commit 37e5bf0cbb

View File

@ -21,22 +21,31 @@ $$
\end{cases} \end{cases}
$$ $$
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ are the standard
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the deviations in $x$ and $y$ directions respectively and $\rho$ is the bivariate
correlation. correlation, hence:
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
$$
\sigma_{xy} = \rho \sigma_x \sigma_y
$$
where $\sigma_{xy}$ is the covariance of $x$ and $y$.
In the code, default settings are $N_s = 800$ points for the signal and $N_n =
1000$ points for the noise but can be changed from the command-line. Both 1000$ points for the noise but can be changed from the command-line. Both
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
of points in the sample. The library `gsl_matrix` provided by GSL was employed of points in the sample. The library `gsl_matrix` provided by GSL was employed
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
generating the points. generating the points.
Then, a model of classification must be implemented in order to assign each Assuming not to know how the points were generated, a model of classification
point to the right class (signal or noise) to which it 'most probably' belongs must then be implemented in order to assign each point to the right class
to. The point is how 'most probably' can be interpreted and implemented. (signal or noise) to which it 'most probably' belongs to. The point is how
'most probably' can be interpreted and implemented.
## Fisher linear discriminant ## Fisher linear discriminant
### The theory
The Fisher linear discriminant (FLD) is a linear classification model based on The Fisher linear discriminant (FLD) is a linear classification model based on
dimensionality reduction. It allows to reduce this 2D classification problem dimensionality reduction. It allows to reduce this 2D classification problem
into a one-dimensional decision surface. into a one-dimensional decision surface.
@ -46,12 +55,12 @@ simplest representation of a linear discriminant is obtained by taking a linear
function of a sampled point 2D $x$ so that: function of a sampled point 2D $x$ so that:
$$ $$
\hat{x} = w x + w_0 \hat{x} = w^T x
$$ $$
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
bias is called 'threshold'. An input point $x$ is assigned to the first class assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
if $\hat{x} \geqslant 0$ and to the second one otherwise. otherwise, where $w_{th}$ is a threshold somehow defined.
In general, the projection onto one dimension leads to a considerable loss of In general, the projection onto one dimension leads to a considerable loss of
information and classes that are well separated in the original 2D space may information and classes that are well separated in the original 2D space may
become strongly overlapping in one dimension. However, by adjusting the become strongly overlapping in one dimension. However, by adjusting the
@ -71,7 +80,7 @@ The simplest measure of the separation of the classes is the separation of the
projected class means. This suggests that to choose $w$ so as to maximize: projected class means. This suggests that to choose $w$ so as to maximize:
$$ $$
\hat{m}_2 \hat{m}_1 = w (m_2 m_1) \hat{m}_2 \hat{m}_1 = w^T (m_2 m_1)
$$ $$
![The plot on the left shows samples from two classes along with the histograms ![The plot on the left shows samples from two classes along with the histograms
@ -105,3 +114,63 @@ by:
$$ $$
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2} J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
$$ $$
Differentiating $J(w)$ with respect to $w$, it can be found that it is
maximized when:
$$
w = S_b^{-1} (m_2 - m_1)
$$
where $S_b$ is the within-classes covariance matrix, given by:
$$
S_b = S_1 + S_2
$$
where $S_1$ and $S_2$ are the covariance matrix of the two classes, namely:
$$
\begin{pmatrix}
\sigma_x^2 & \sigma_{xy} \\
\sigma_{xy} & \sigma_y^2
\end{pmatrix}
$$
This is not truly a discriminant but rather a specific choice of direction for
projection of the data down to one dimension: the projected data can then be
used to construct a discriminant by choosing a threshold for the
classification.
### The code
As stated above, the projection vector is given by
$$
x = S_b^{-1} (\mu_1 - \mu_2)
$$
where $\mu_1$ and $\mu_2$ are the two classes means.
$$
r = \frac{N_s}{N_n}
$$
cmpute S_b
$S_b = S_1 + S_2$
$$
\mu_1 = (\mu_{1x}, \mu_{1y})
$$
the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
and positive-definite.
$$
diff = \mu_1 - \mu_2
$$
product with the `gsl_blas_dgemv()` function provided by GSL.
result normalised with gsl functions.`