ex-7: Finished writing about perceptron

This commit is contained in:
Giù Marcer 2020-04-06 23:16:56 +02:00 committed by rnhmjoj
parent 295b0ec625
commit 12fc0c406e

View File

@ -36,11 +36,11 @@ samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
of points in the sample. The library `gsl_matrix` provided by GSL was employed of points in the sample. The library `gsl_matrix` provided by GSL was employed
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
generating the points. generating the points.
An example of the two samples is shown in @fig:fisher_points. An example of the two samples is shown in @fig:points.
![Example of points sorted according to two Gaussian with ![Example of points sorted according to two Gaussian with
the given parameters. Noise points in pink and signal points the given parameters. Noise points in pink and signal points
in yellow.](images/fisher-points.pdf){#fig:fisher_points} in yellow.](images/points.pdf){#fig:points}
Assuming not to know how the points were generated, a model of classification Assuming not to know how the points were generated, a model of classification
must then be implemented in order to assign each point to the right class must then be implemented in order to assign each point to the right class
@ -154,6 +154,7 @@ the Cholesky method, already discussed in @sec:MLM.
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()` Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
function provided by GSL. function provided by GSL.
### The threshold ### The threshold
The cut was fixed by the condition of conditional probability being the same The cut was fixed by the condition of conditional probability being the same
@ -198,7 +199,8 @@ this case were the weight vector and the position of the point to be projected.
![Gaussian of the samples on the projection ![Gaussian of the samples on the projection
line.](images/fisher-proj.pdf){height=5.7cm} line.](images/fisher-proj.pdf){height=5.7cm}
Aeral and lateral views of the projection direction, in blue, and the cut, in red. Aeral and lateral views of the projection direction, in blue, and the cut, in
red.
</div> </div>
Results obtained for the same sample in @fig:fisher_points are shown in Results obtained for the same sample in @fig:fisher_points are shown in
@ -212,3 +214,78 @@ and $t_{\text{cut}}$ is 1.323 far from the origin of the axes. Hence, as can be
seen, the vector $w$ turned out to be parallel to the line joining the means of seen, the vector $w$ turned out to be parallel to the line joining the means of
the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the
total covariance matrix $S$ is isotropic, proportional to the unit matrix. total covariance matrix $S$ is isotropic, proportional to the unit matrix.
## Perceptron
In machine learning, the perceptron is an algorithm for supervised learning of
linear binary classifiers.
Supervised learning is the machine learning task of inferring a function $f$
that maps an input $x$ to an output $f(x)$ based on a set of training
input-output pairs. Each example is a pair consisting of an input object and an
output value. The inferred function can be used for mapping new examples. The
algorithm will be generalized to correctly determine the class labels for unseen
instances.
The aim is to determine the threshold function $f(x)$ for the dot product
between the (in this case 2D) vector point $x$ and the weight vector $w$:
$$
f(x) = x \cdot w + b
$$ {#eq:perc}
where $b$ is called 'bias'. If $f(x) \geqslant 0$, than the point can be
assigned to the class $C_1$, to $C_2$ otherwise.
The training was performed as follow. The idea is that the function $f(x)$ must
return 0 when the point $x$ belongs to the noise and 1 if it belongs to the
signal. Initial values were set as $w = (0,0)$ and $b = 0$. From these, the
perceptron starts to improve their estimations. The sample was passed point by
point into a reiterative procedure a grand total of $N_c$ calls: each time, the
projection $w \cdot x$ of the point was computed and then the variable $\Delta$ was defined as:
$$
\Delta = r * (e - \theta (f(x))
$$
where:
- $r$ is the learning rate of the perceptron: it is between 0 and 1. The
larger $r$, the more volatile the weight changes. In the code, it was set
$r = 0.8$;
- $e$ is the expected value, namely 0 if $x$ is noise and 1 if it is signal;
- $\theta$ is the Heavyside theta function;
- $o$ is the observed value of $f(x)$ defined in @eq:perc.
Then $b$ and $w$ must be updated as:
$$
b \longrightarrow b + \Delta
\et
w \longrightarrow w + x \Delta
$$
<div id="fig:percep_proj">
![View from above of the samples.](images/percep-plane.pdf){height=5.7cm}
![Gaussian of the samples on the projection
line.](images/percep-proj.pdf){height=5.7cm}
Aeral and lateral views of the projection direction, in blue, and the cut, in
red.
</div>
It can be shown that this method converges to the coveted function.
As stated in the previous section, the weight vector must finally be normalzied.
With $N_c = 5$, the values of $w$ and $t_{\text{cut}}$ level off up to the third
digit. The following results were obtained:
$$
w = (0.654, 0.756) \et t_{\text{cut}} = 1.213
$$
where, once again, $t_{\text{cut}}$ is computed from the origin of the axes. In
this case, the projection line does not lies along the mains of the two
samples. Plots in @fig:percep_proj.
## Efficiency test