2020-03-31 23:37:49 +02:00
|
|
|
|
# Exercise 7
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
## Generating points according to Gaussian distributions {#sec:sampling}
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
The first task of exercise 7 is to generate two sets of 2D points $(x, y)$
|
|
|
|
|
according to two bivariate Gaussian distributions with parameters:
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\text{signal} \quad
|
|
|
|
|
\begin{cases}
|
|
|
|
|
\mu = (0, 0) \\
|
|
|
|
|
\sigma_x = \sigma_y = 0.3 \\
|
|
|
|
|
\rho = 0.5
|
|
|
|
|
\end{cases}
|
|
|
|
|
\et
|
|
|
|
|
\text{noise} \quad
|
|
|
|
|
\begin{cases}
|
|
|
|
|
\mu = (4, 4) \\
|
|
|
|
|
\sigma_x = \sigma_y = 1 \\
|
|
|
|
|
\rho = 0.4
|
|
|
|
|
\end{cases}
|
|
|
|
|
$$
|
|
|
|
|
|
2020-04-01 23:39:19 +02:00
|
|
|
|
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ are the standard
|
|
|
|
|
deviations in $x$ and $y$ directions respectively and $\rho$ is the bivariate
|
|
|
|
|
correlation, hence:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\sigma_{xy} = \rho \sigma_x \sigma_y
|
|
|
|
|
$$
|
|
|
|
|
|
|
|
|
|
where $\sigma_{xy}$ is the covariance of $x$ and $y$.
|
|
|
|
|
In the code, default settings are $N_s = 800$ points for the signal and $N_n =
|
2020-03-31 23:37:49 +02:00
|
|
|
|
1000$ points for the noise but can be changed from the command-line. Both
|
|
|
|
|
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
|
|
|
|
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
|
|
|
|
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
|
|
|
|
generating the points.
|
|
|
|
|
|
2020-04-01 23:39:19 +02:00
|
|
|
|
Assuming not to know how the points were generated, a model of classification
|
|
|
|
|
must then be implemented in order to assign each point to the right class
|
|
|
|
|
(signal or noise) to which it 'most probably' belongs to. The point is how
|
|
|
|
|
'most probably' can be interpreted and implemented.
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
## Fisher linear discriminant
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
### The projection direction
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
2020-03-31 23:37:49 +02:00
|
|
|
|
The Fisher linear discriminant (FLD) is a linear classification model based on
|
|
|
|
|
dimensionality reduction. It allows to reduce this 2D classification problem
|
|
|
|
|
into a one-dimensional decision surface.
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
Consider the case of two classes (in this case the signal and the noise): the
|
2020-03-31 23:37:49 +02:00
|
|
|
|
simplest representation of a linear discriminant is obtained by taking a linear
|
2020-04-02 23:35:36 +02:00
|
|
|
|
function of a sampled 2D point $x$ so that:
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
$$
|
2020-04-01 23:39:19 +02:00
|
|
|
|
\hat{x} = w^T x
|
2020-03-31 23:37:49 +02:00
|
|
|
|
$$
|
|
|
|
|
|
2020-04-01 23:39:19 +02:00
|
|
|
|
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
|
|
|
|
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
2020-04-02 23:35:36 +02:00
|
|
|
|
otherwise, where $w_{th}$ is a threshold value somehow defined.
|
2020-03-31 23:37:49 +02:00
|
|
|
|
In general, the projection onto one dimension leads to a considerable loss of
|
|
|
|
|
information and classes that are well separated in the original 2D space may
|
|
|
|
|
become strongly overlapping in one dimension. However, by adjusting the
|
|
|
|
|
components of the weight vector, a projection that maximizes the classes
|
|
|
|
|
separation can be selected.
|
2020-04-02 23:35:36 +02:00
|
|
|
|
To begin with, consider $N_1$ points of class $C_1$ and $N_2$ points of class
|
|
|
|
|
$C_2$, so that the means $m_1$ and $m_2$ of the two classes are given by:
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
|
|
|
|
\et
|
|
|
|
|
m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
|
|
|
|
|
$$
|
|
|
|
|
|
|
|
|
|
The simplest measure of the separation of the classes is the separation of the
|
2020-04-02 23:35:36 +02:00
|
|
|
|
projected class means. This suggests to choose $w$ so as to maximize:
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
$$
|
2020-04-01 23:39:19 +02:00
|
|
|
|
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
2020-03-31 23:37:49 +02:00
|
|
|
|
$$
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
This expression can be made arbitrarily large simply by increasing the magnitude
|
|
|
|
|
of $w$. To solve this problem, $w$ can be constrained to have unit length, so
|
|
|
|
|
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
|
|
|
|
maximization, it can be found that $w \propto (m_2 − m_1)$.
|
|
|
|
|
|
2020-03-31 23:37:49 +02:00
|
|
|
|
![The plot on the left shows samples from two classes along with the histograms
|
|
|
|
|
resulting from projection onto the line joining the class means: note that
|
|
|
|
|
there is considerable overlap in the projected space. The right plot shows the
|
|
|
|
|
corresponding projection based on the Fisher linear discriminant, showing the
|
|
|
|
|
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
|
|
|
|
|
|
|
|
|
There is still a problem with this approach, however, as illustrated in
|
|
|
|
|
@fig:overlap: the two classes are well separated in the original 2D space but
|
|
|
|
|
have considerable overlap when projected onto the line joining their means.
|
|
|
|
|
The idea to solve it is to maximize a function that will give a large separation
|
|
|
|
|
between the projected classes means while also giving a small variance within
|
|
|
|
|
each class, thereby minimizing the class overlap.
|
2020-04-02 23:35:36 +02:00
|
|
|
|
The within-classes variance of the transformed data of each class $k$ is given
|
2020-03-31 23:37:49 +02:00
|
|
|
|
by:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
|
|
|
|
|
$$
|
|
|
|
|
|
|
|
|
|
The total within-classes variance for the whole data set can be simply defined
|
2020-04-02 23:35:36 +02:00
|
|
|
|
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is therefore defined to be the
|
|
|
|
|
ratio of the between-classes distance to the within-classes variance and is
|
|
|
|
|
given by:
|
2020-03-31 23:37:49 +02:00
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
|
|
|
|
$$
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
Differentiating $J(w)$ with respect to $w$, it can be found that it is
|
|
|
|
|
maximized when:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
w = S_b^{-1} (m_2 - m_1)
|
|
|
|
|
$$
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
where $S_b$ is the covariance matrix, given by:
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
S_b = S_1 + S_2
|
|
|
|
|
$$
|
|
|
|
|
|
|
|
|
|
where $S_1$ and $S_2$ are the covariance matrix of the two classes, namely:
|
|
|
|
|
|
|
|
|
|
$$
|
|
|
|
|
\begin{pmatrix}
|
|
|
|
|
\sigma_x^2 & \sigma_{xy} \\
|
|
|
|
|
\sigma_{xy} & \sigma_y^2
|
|
|
|
|
\end{pmatrix}
|
|
|
|
|
$$
|
|
|
|
|
|
|
|
|
|
This is not truly a discriminant but rather a specific choice of direction for
|
|
|
|
|
projection of the data down to one dimension: the projected data can then be
|
|
|
|
|
used to construct a discriminant by choosing a threshold for the
|
|
|
|
|
classification.
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
When implemented, the parameters given in @sec:sampling were used to compute
|
|
|
|
|
the covariance matrices $S_1$ and $S_2$ of the two classes and their sum $S$.
|
|
|
|
|
Then $S$, being a symmetrical and positive-definite matrix, was inverted with
|
|
|
|
|
the Cholesky method, already discussed in @sec:MLM.
|
|
|
|
|
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
|
|
|
|
|
function provided by GSL.
|
|
|
|
|
|
|
|
|
|
### The threshold
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
The cut was fixed by the condition of conditional probability being the same
|
|
|
|
|
for each class:
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
$$
|
2020-04-02 23:35:36 +02:00
|
|
|
|
t_{\text{cut}} = x \, | \hspace{20pt}
|
|
|
|
|
\frac{P(c_1 | x)}{P(c_2 | x)} =
|
|
|
|
|
\frac{p(x | c_1) \, p(c_1)}{p(x | c_1) \, p(c_2)} = 1
|
2020-04-01 23:39:19 +02:00
|
|
|
|
$$
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
where $p(x | c_k)$ is the probability for point $x$ along the Fisher projection
|
|
|
|
|
line of belonging to the class $k$. If the classes are bivariate Gaussian, as
|
|
|
|
|
in the present case, then $p(x | c_k)$ is simply given by its projected normal
|
|
|
|
|
distribution $\mathscr{G} (\hat{μ}, \hat{S})$. With a bit of math, the solution
|
|
|
|
|
is then:
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
$$
|
2020-04-02 23:35:36 +02:00
|
|
|
|
t = \frac{b}{a} + \sqrt{\left( \frac{b}{a} \right)^2 - \frac{c}{a}}
|
2020-04-01 23:39:19 +02:00
|
|
|
|
$$
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
where:
|
|
|
|
|
|
|
|
|
|
- $a = \hat{S}_1^2 - \hat{S}_2^2$
|
|
|
|
|
- $b = \hat{m}_2 \, \hat{S}_1^2 - \hat{M}_1 \, \hat{S}_2^2$
|
|
|
|
|
- $c = \hat{M}_2^2 \, \hat{S}_1^2 - \hat{M}_1^2 \, \hat{S}_2^2
|
|
|
|
|
- 2 \, \hat{S}_1^2 \, \hat{S}_2^2 \, \ln(\alpha)$
|
|
|
|
|
- $\alpha = p(c_1) / p(c_2)$
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
The ratio of the prior probability $\alpha$ was computed as:
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
$$
|
2020-04-02 23:35:36 +02:00
|
|
|
|
\alpha = \frac{N_s}{N_n}
|
2020-04-01 23:39:19 +02:00
|
|
|
|
$$
|
|
|
|
|
|
2020-04-02 23:35:36 +02:00
|
|
|
|
The projection of the points was accomplished by the use of the function
|
|
|
|
|
`gsl_blas_ddot`, which computed a dot product between two vectors, which in
|
|
|
|
|
this case were the weight vector and the position of the point to be projected.
|
2020-04-01 23:39:19 +02:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|