From 01a5f06cc5c2a659b54309f5edf2d3486d33c2bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gi=C3=B9=20Marcer?= Date: Sun, 15 Mar 2020 21:42:41 +0100 Subject: [PATCH] ex-5: complete importance sampling --- notes/sections/5.md | 114 +++++++++++++++++++++++--------------------- 1 file changed, 59 insertions(+), 55 deletions(-) diff --git a/notes/sections/5.md b/notes/sections/5.md index be503c3..dcbe8c4 100644 --- a/notes/sections/5.md +++ b/notes/sections/5.md @@ -108,8 +108,7 @@ $$ $$ $$ - \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( \frac{x_j - \bar{x}_i}{n_i} - \right)^2 + \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2 \thus {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i} $$ @@ -225,11 +224,14 @@ diff, seems to seesaw around the correct value. ## Importance sampling -In statistics, importance sampling is a technique for estimating properties of -a given distribution, while only having samples generated from a different -distribution than the distribution of interest. -Consider a sample of $n$ points {$x_i$} generated according to a probability -distribition function $P$ which gives thereby the following expected value: +In statistics, importance sampling is a method which samples points from the +probability distribution $f$ itself, so that the points cluster in the regions +that make the largest contribution to the integral. + +Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f +\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$} +generated according to a probability distribition function $P$ which gives +thereby the following expected value: $$ E [x, P] = \frac{1}{n} \sum_i x_i @@ -238,21 +240,46 @@ $$ with variance: $$ - \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n} + \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n} + \with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2 $$ -where $i$ runs over the sample and $\sigma^2 [x, P]$ is the variance of the -sorted points. -The idea is to sample them from a different distribution to lower the variance -of $E[x, P]$. This is accomplished by choosing a random variable $y \geq 0$ such -that $E[y ,P] = 1$. Then, a new probability $P^{(y)}$ is defined in order to -satisfy: +where $i$ runs over the sample. +In the case of plain MC, $\langle f \rangle$ is estimated as the expected +value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they +are evenly distributed in $\Omega$. The idea is to sample points from a +different distribution to lower the variance of $E[x, P]$, which results in +lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable +$y$ and defining a new probability $P^{(y)}$ in order to satisfy: $$ E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right] $$ -This new estimate is better then former one if: +which is to say: + +$$ + I = \int \limits_{\Omega} dx f(x) = + \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)= + \int \limits_{\Omega} dx \, w(x) \, g(x) +$$ + +where $E \, \longleftrightarrow \, I$ and: + +$$ + \begin{cases} + f(x) \, \longleftrightarrow \, x \\ + 1 \, \longleftrightarrow \, P + \end{cases} + \et + \begin{cases} + w(x) \, \longleftrightarrow \, \frac{x}{y} \\ + g(x) \, \longleftrightarrow \, y = P^{(y)} + \end{cases} +$$ + +Where the symbol $\longleftrightarrow$ points out the connection between the +variables. This new estimate is better than the former if: $$ \sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P] @@ -261,55 +288,32 @@ $$ The best variable $y$ would be: $$ - y^{\star} = \frac{x}{E [x, P]} \thus \frac{x}{y^{\star}} = E [x, P] + y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I} + \thus \frac{x}{y^{\star}} = E [x, P] $$ -and a single sample under $P^{(y^{\star})}$ suffices to give its value. - - - - - - - - - - - - - - - - - - - - - +and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its +value. Obviously, it is not possible to take exactly this choice, since $E [x, +P]$ is not given a priori. +However, this gives an insight into what importance sampling does. In fact, +given that: +$$ + E [x, P] = \int \limits_{a = - \infty}^{a = + \infty} + a P(x \in [a, a + da]) +$$ +the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so +that its samples frequencies are sorted directly according to their weights in +$E[x, P]$, namely: +$$ + P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da]) +$$ --- -The logic underlying importance sampling lies in a simple rearrangement of terms -in the integral to be computed: - -$$ - I = \int \limits_{\Omega} dx f(x) = - \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)= - \int \limits_{\Omega} dx \, w(x) \, g(x) -$$ - -where $w(x)$ is called 'importance function': a good importance function will be -large when the integrand is large and small otherwise. - ---- - - -For example, in some of these points the function value is lower compared to -others and therefore contributes less to the whole integral. ### VEGAS \textcolor{red}{WIP}