diff --git a/notes/sections/0.md b/notes/sections/0.md index cfbade3..49a4cb9 100644 --- a/notes/sections/0.md +++ b/notes/sections/0.md @@ -42,6 +42,10 @@ header-includes: | \DeclareMathOperator*{\et}{% \hspace{30pt} \wedge \hspace{30pt} } + %% "if" in formulas + \DeclareMathOperator*{\incase}{% + \hspace{20pt} \text{if} \hspace{20pt} + } \makeatletter \renewcommand\maketitle{ diff --git a/notes/sections/7.md b/notes/sections/7.md index 5ebb290..c977686 100644 --- a/notes/sections/7.md +++ b/notes/sections/7.md @@ -199,11 +199,11 @@ this case were the weight vector and the position of the point to be projected. ![Gaussian of the samples on the projection line.](images/fisher-proj.pdf){height=5.7cm} -Aeral and lateral views of the projection direction, in blue, and the cut, in +Aerial and lateral views of the projection direction, in blue, and the cut, in red. -Results obtained for the same sample in @fig:fisher_points are shown in +Results obtained for the same sample in @fig:points are shown in @fig:fisher_proj. The weight vector $w$ was found to be: $$ @@ -227,22 +227,21 @@ output value. The inferred function can be used for mapping new examples. The algorithm will be generalized to correctly determine the class labels for unseen instances. -The aim is to determine the threshold function $f(x)$ for the dot product -between the (in this case 2D) vector point $x$ and the weight vector $w$: +The aim is to determine the bias $b$ such that the threshold function $f(x)$: $$ - f(x) = x \cdot w + b + f(x) = x \cdot w + b \hspace{20pt} + \begin{cases} + \geqslant 0 \incase x \in \text{signal} \\ + < 0 \incase x \in \text{noise} + \end{cases} $$ {#eq:perc} -where $b$ is called 'bias'. If $f(x) \geqslant 0$, than the point can be -assigned to the class $C_1$, to $C_2$ otherwise. - -The training was performed as follow. The idea is that the function $f(x)$ must -return 0 when the point $x$ belongs to the noise and 1 if it belongs to the -signal. Initial values were set as $w = (0,0)$ and $b = 0$. From these, the -perceptron starts to improve their estimations. The sample was passed point by -point into a reiterative procedure a grand total of $N_c$ calls: each time, the -projection $w \cdot x$ of the point was computed and then the variable $\Delta$ was defined as: +The training was performed as follow. Initial values were set as $w = (0,0)$ and +$b = 0$. From these, the perceptron starts to improve their estimations. The +sample was passed point by point into a reiterative procedure a grand total of +$N_c$ calls: each time, the projection $w \cdot x$ of the point was computed +and then the variable $\Delta$ was defined as: $$ \Delta = r * (e - \theta (f(x)) @@ -254,15 +253,15 @@ where: larger $r$, the more volatile the weight changes. In the code, it was set $r = 0.8$; - $e$ is the expected value, namely 0 if $x$ is noise and 1 if it is signal; - - $\theta$ is the Heavyside theta function; + - $\theta$ is the Heaviside theta function; - $o$ is the observed value of $f(x)$ defined in @eq:perc. Then $b$ and $w$ must be updated as: $$ - b \longrightarrow b + \Delta + b \to b + \Delta \et - w \longrightarrow w + x \Delta + w \to w + x \Delta $$
@@ -270,12 +269,12 @@ $$ ![Gaussian of the samples on the projection line.](images/percep-proj.pdf){height=5.7cm} -Aeral and lateral views of the projection direction, in blue, and the cut, in +Aerial and lateral views of the projection direction, in blue, and the cut, in red.
It can be shown that this method converges to the coveted function. -As stated in the previous section, the weight vector must finally be normalzied. +As stated in the previous section, the weight vector must finally be normalized. With $N_c = 5$, the values of $w$ and $t_{\text{cut}}$ level off up to the third digit. The following results were obtained: @@ -289,3 +288,47 @@ this case, the projection line does not lies along the mains of the two samples. Plots in @fig:percep_proj. ## Efficiency test + +A program was implemented in order to check the validity of the two +aforementioned methods. +A number $N_t$ of test samples was generated and the +points were divided into the two classes according to the selected method. +At each iteration, false positives and negatives are recorded using a running +statistics method implemented in the `gsl_rstat` library, being suitable for +handling large datasets for which it is inconvenient to store in memory all at +once. +For each sample, the numbers $N_{fn}$ and $N_{fp}$ of false positive and false +negative are computed with the following trick: + +Every noise point $x_n$ was checked this way: the function $f(x_n)$ was computed +with the weight vector $w$ and the $t_{\text{cut}}$ given by the employed method, +then: + + - if $f(x) < 0 \thus$ $N_{fn} \to N_{fn}$ + - if $f(x) > 0 \thus$ $N_{fn} \to N_{fn} + 1$ + +Similarly for the positive points. +Finally, the mean and the standard deviation were obtained from $N_{fn}$ and +$N_{fp}$ computed for every sample in order to get the mean purity $\alpha$ +and efficiency $\beta$ for the employed statistics: + +$$ + \alpha = 1 - \frac{\text{mean}(N_{fn})}{N_s} \et + \beta = 1 - \frac{\text{mean}(N_{fp})}{N_n} +$$ + +Results for $N_t = 500$: + +------------------------------------------------------------------------------------------- + $\alpha$ $\sigma_{\alpha}$ $\beta$ $\sigma_{\beta}$ +----------- ------------------- ------------------- ------------------- ------------------- +Fisher 0.9999 0.33 0.9999 0.33 + +Perceptron 0.9999 0.28 0.9995 0.64 +------------------------------------------------------------------------------------------- + +Table: Results for Fisher and perceptron method. $\sigma_{\alpha}$ and + $\sigma_{\beta}$ stand for the standard deviation of the false + negative and false positive respectively. + +\textcolor{red}{MISSING COMMENTS ON RESULTS.}