ex-7: Finished writing about perceptron
This commit is contained in:
parent
295b0ec625
commit
12fc0c406e
@ -36,11 +36,11 @@ samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
|||||||
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||||||
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||||||
generating the points.
|
generating the points.
|
||||||
An example of the two samples is shown in @fig:fisher_points.
|
An example of the two samples is shown in @fig:points.
|
||||||
|
|
||||||
![Example of points sorted according to two Gaussian with
|
![Example of points sorted according to two Gaussian with
|
||||||
the given parameters. Noise points in pink and signal points
|
the given parameters. Noise points in pink and signal points
|
||||||
in yellow.](images/fisher-points.pdf){#fig:fisher_points}
|
in yellow.](images/points.pdf){#fig:points}
|
||||||
|
|
||||||
Assuming not to know how the points were generated, a model of classification
|
Assuming not to know how the points were generated, a model of classification
|
||||||
must then be implemented in order to assign each point to the right class
|
must then be implemented in order to assign each point to the right class
|
||||||
@ -154,6 +154,7 @@ the Cholesky method, already discussed in @sec:MLM.
|
|||||||
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
|
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
|
||||||
function provided by GSL.
|
function provided by GSL.
|
||||||
|
|
||||||
|
|
||||||
### The threshold
|
### The threshold
|
||||||
|
|
||||||
The cut was fixed by the condition of conditional probability being the same
|
The cut was fixed by the condition of conditional probability being the same
|
||||||
@ -198,7 +199,8 @@ this case were the weight vector and the position of the point to be projected.
|
|||||||
![Gaussian of the samples on the projection
|
![Gaussian of the samples on the projection
|
||||||
line.](images/fisher-proj.pdf){height=5.7cm}
|
line.](images/fisher-proj.pdf){height=5.7cm}
|
||||||
|
|
||||||
Aeral and lateral views of the projection direction, in blue, and the cut, in red.
|
Aeral and lateral views of the projection direction, in blue, and the cut, in
|
||||||
|
red.
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
Results obtained for the same sample in @fig:fisher_points are shown in
|
Results obtained for the same sample in @fig:fisher_points are shown in
|
||||||
@ -212,3 +214,78 @@ and $t_{\text{cut}}$ is 1.323 far from the origin of the axes. Hence, as can be
|
|||||||
seen, the vector $w$ turned out to be parallel to the line joining the means of
|
seen, the vector $w$ turned out to be parallel to the line joining the means of
|
||||||
the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the
|
the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the
|
||||||
total covariance matrix $S$ is isotropic, proportional to the unit matrix.
|
total covariance matrix $S$ is isotropic, proportional to the unit matrix.
|
||||||
|
|
||||||
|
|
||||||
|
## Perceptron
|
||||||
|
|
||||||
|
In machine learning, the perceptron is an algorithm for supervised learning of
|
||||||
|
linear binary classifiers.
|
||||||
|
Supervised learning is the machine learning task of inferring a function $f$
|
||||||
|
that maps an input $x$ to an output $f(x)$ based on a set of training
|
||||||
|
input-output pairs. Each example is a pair consisting of an input object and an
|
||||||
|
output value. The inferred function can be used for mapping new examples. The
|
||||||
|
algorithm will be generalized to correctly determine the class labels for unseen
|
||||||
|
instances.
|
||||||
|
|
||||||
|
The aim is to determine the threshold function $f(x)$ for the dot product
|
||||||
|
between the (in this case 2D) vector point $x$ and the weight vector $w$:
|
||||||
|
|
||||||
|
$$
|
||||||
|
f(x) = x \cdot w + b
|
||||||
|
$$ {#eq:perc}
|
||||||
|
|
||||||
|
where $b$ is called 'bias'. If $f(x) \geqslant 0$, than the point can be
|
||||||
|
assigned to the class $C_1$, to $C_2$ otherwise.
|
||||||
|
|
||||||
|
The training was performed as follow. The idea is that the function $f(x)$ must
|
||||||
|
return 0 when the point $x$ belongs to the noise and 1 if it belongs to the
|
||||||
|
signal. Initial values were set as $w = (0,0)$ and $b = 0$. From these, the
|
||||||
|
perceptron starts to improve their estimations. The sample was passed point by
|
||||||
|
point into a reiterative procedure a grand total of $N_c$ calls: each time, the
|
||||||
|
projection $w \cdot x$ of the point was computed and then the variable $\Delta$ was defined as:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\Delta = r * (e - \theta (f(x))
|
||||||
|
$$
|
||||||
|
|
||||||
|
where:
|
||||||
|
|
||||||
|
- $r$ is the learning rate of the perceptron: it is between 0 and 1. The
|
||||||
|
larger $r$, the more volatile the weight changes. In the code, it was set
|
||||||
|
$r = 0.8$;
|
||||||
|
- $e$ is the expected value, namely 0 if $x$ is noise and 1 if it is signal;
|
||||||
|
- $\theta$ is the Heavyside theta function;
|
||||||
|
- $o$ is the observed value of $f(x)$ defined in @eq:perc.
|
||||||
|
|
||||||
|
Then $b$ and $w$ must be updated as:
|
||||||
|
|
||||||
|
$$
|
||||||
|
b \longrightarrow b + \Delta
|
||||||
|
\et
|
||||||
|
w \longrightarrow w + x \Delta
|
||||||
|
$$
|
||||||
|
|
||||||
|
<div id="fig:percep_proj">
|
||||||
|
![View from above of the samples.](images/percep-plane.pdf){height=5.7cm}
|
||||||
|
![Gaussian of the samples on the projection
|
||||||
|
line.](images/percep-proj.pdf){height=5.7cm}
|
||||||
|
|
||||||
|
Aeral and lateral views of the projection direction, in blue, and the cut, in
|
||||||
|
red.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
It can be shown that this method converges to the coveted function.
|
||||||
|
As stated in the previous section, the weight vector must finally be normalzied.
|
||||||
|
|
||||||
|
With $N_c = 5$, the values of $w$ and $t_{\text{cut}}$ level off up to the third
|
||||||
|
digit. The following results were obtained:
|
||||||
|
|
||||||
|
$$
|
||||||
|
w = (0.654, 0.756) \et t_{\text{cut}} = 1.213
|
||||||
|
$$
|
||||||
|
|
||||||
|
where, once again, $t_{\text{cut}}$ is computed from the origin of the axes. In
|
||||||
|
this case, the projection line does not lies along the mains of the two
|
||||||
|
samples. Plots in @fig:percep_proj.
|
||||||
|
|
||||||
|
## Efficiency test
|
||||||
|
Loading…
Reference in New Issue
Block a user