Inverse transform sampling is a basic method for pseudo-random number generation. It can be used for the generation of random samples from a non-uniform probability distribution, when its cumulative distribution function is known.
The main idea of inverse transform sampling is quite simple. We want to sample from a non-uniform distribution by mapping a sampled point from a uniform distribution $U[0,1]$, to the inverse cumulative distribution function (CDF) of the non-uniform distribution.

In this post, we will try to derive Maximum Aposteriori Probability based on Rasmussen’s book on Gaussian Process for Machine Learning (GPML). We will refer to equation 2.7 and 2.8 in the book. This post is not intended to explain what and why the Maximum Aposteriori Probability estimation is. For those whose interested more, the PML Book chapter 4.5 is a good start.
Maximum Aposteriori Probability (MAP) estimate, is an estimate of unknown parameter $\mathbf{w}$.

In this post we will explore the Kullback-Leibler divergence, its properties, and how it relates to Jensen Inequality. Kullback Leibler (KL) divergence is used to measure the probability distribution difference by taking the expected value of the log-ratio between $q(x)$ and $p(x)$. the formula of KL divergence is given by:
$$ \begin{aligned} KL(q||p) &= \mathbb{E}_{q(x)} \log \frac{q(x)}{p(x)} \\ KL(q||p) &= \int q(x)\log \frac{q(x)}{p(x)}dx \end{aligned} $$ why we need KL divergence? why don't we just use distance to measure its difference?

Jensen Inequality is an inequality in mathematics that relates to the concave/convex function. A function is concave if the line segment between any two points on it lies below the graph of the function. Mathematically we can write:
function $f(x)$ is concave if, if any $a,b,\alpha$ satisfies:
$$ f(\alpha a+(1-\alpha)b) \geq \alpha f(a) + (1-\alpha) f(b) \tag{1} $$
where $ 0 \leq \alpha \leq 1$, and $a,b \in X $. Here, $X$ is a concave set.