Probability

Consider a die. It has a sample space $\Omega = \{1,2,3,4,5,6\}$. Elements of the sample space are called elementary outcomes. An event is a subset of the sample space, $E \subseteq \Omega$. For example, $E = \{2,4,6\}$ is an event.

Random experiments

Consider flipping two coins. This is a random experiment. The sample space for a single flip is $\{H, T\}$. For two coins, this would be $\{HH, HT, TT, TH\}$, or more concisely, $\{H, T\}^2$.

The event for "first coin flipped is a head" is $E_1 = \{HH, HT\}$.
The event for "first coin equals second" is $E_2 = \{HH, TT\}$.

Another example is rolling three dice. The sample space is $\Omega = \{1,2,3,4,5,6\}^3$.

Another example is a hand of 5 cards. A single card has both a suit and a rank so the set of cards is $C = \{H,D,C,S\} \times \{A,2,3,...,J,Q,K\}$. The sample space for 5 cards is $\{A \subseteq C: |A| = 5\}$. $C^5$ is not correct as once a card is dealt it can't be dealt again.

Combinatorics

$|A \times B| = |A| \times |B|$.
A set of cardinality $n$ has $n!$ permutations.

An arrangement is like multiple permutations up to a given length. There are $n(n-1)...(n-k+1)$ elements in an arrangement of a set of size $n$ and length $k$.

A combination is a permutation where the order does not matter and the items are of a given size $k$. Where a permutation would include AB and BA, the combination only includes AB. The size of a combination of a set of size $n$ with items of length $k$ is the size of the equivalent arrangement divided by $k!$ which is $\frac{n!}{k! (n-k)!}$.

Assigning probabilities

The probability of an event $E$ is denoted $P(E)$.
A probability space is an ordered triple, $(\Omega, F, P)$ where $\Omega$ is the sample space, $F$ is the collection of all events, given by $2^\Omega$ as an event is a subset of $\Omega$ and $2^\Omega$ is all subsets of $\Omega$. $P$ is the probability measure. $P$ is a function from events to real numbers between 0 and 1 inclusive.

A fair coin would sample space $\{H, T\}$ with $P_H = 0.5,$ and $P_T = 0.5$.

Events

Probability of union

$P(A_1 \cup A_2 \cup ... \cup A_k) = \sum\limits^k_{i=i} P(A_i)$ if $A_i \cap A_j = \emptyset$ and $i \neq j$.
$P(A \cup B) = P(A) + P(B)$ if $P(A \cap B) = 0$.
$P(A \cup B) = P(A) + P(\overline{A} \cap B)$.

Union bound

The union bound for two events is $P(A \cup B) \leq P(A) + P(B)$ and more generally $P(\bigcup^n_{i=1} A_i) \leq \sum^n_{i=1} P(A_i)$.

Probability of complement and product

$P(\overline{A}) = 1 - P(A)$ because $A \cup \overline{A} = \Omega$ and $A \cap \overline{A} = \emptyset$, so $P(\Omega) = P(A \cup \overline{A}) = P(A) + P(\overline{A})=1$ and as $P(A) + P(\overline{A}) = 1$, $P(\overline{A}) = 1 - P(A)$.

In general, $P(A \cap B) \neq P(A) \times P(B)$. When they are equal, the events are independent.

Conditional probability

Given a probability space $(\Omega, 2^\Omega, P)$, an event $B \in 2^\Omega$, $P(B) > 0$ and an event $A \in 2^\Omega$, the probability of $A$ given $B$ is denoted $P(A|B) = \frac{P(A \cap B)}{P(B)}$.

Law of total probability

$P(A) = \sum\limits^k_{i=1} P(A|B_i) \times P(B_i)$ where $B_1, ... ,B_k$ are all mutually exclusive events in a sample space and $A$ is any event in the sample space.

Bayes' theorem

$P(B_i|A) = \frac{P(A|B_i) \times P(B_i)}{P(A)}$

Events are independent if $P(A \cap B) = P(A) \times P(B)$.

Random variables

A random variable is a function from the sample space to the real numbers.

Bernoulli distribution

$$ X = \begin{cases} 1 & \text{if success}\\ 0 & \text{if failure} \end{cases} $$

For example, a coin flip has Bernoulli distribution with parameter $p$ where

$$ X(\omega) = \begin{cases} 1 & \text{if } \omega = t\\ 0 & \text{if } \omega = h \end{cases} $$

Binomial distribution

The binomial distribution can model $n$ coin flips. The outcome of each flip is independent and $P(t) = p$.

$$ X_i = \begin{cases} 1 & \text{if } i^\text{th} \text{ flip gives tails}\\ 0 & \text{otherwise} \end{cases} $$

This means $X_i$ has Bernoulli distribution with parameter $p$. The Bernoulli distribution is the binomial distribution with $n=1$.
Let $X = X_1 + ... + X_n$. $X$ has binomial distribution with parameters $n$ and $p$, where $n$ is the number of trials and $p$ is the probability of success.

Given the Bernoulli distribution $X \sim{} \operatorname{Bi}(1, p)$:

$x$	0	1
$P(X=x)$	$1-p$	$p$

Given the binomial distribution $X \sim \operatorname{Bi}(3, p)$:

$x$	0	1	2	3
$P(X=x)$	$(1-p)^3$	$3(1-p)^2p$	$3(1-p)p^2$	$p^3$

Expectation

Take a random variable $X$ which only takes values in $\{0, 1, 2, 3\}$. The expectation of $X$ is defined as $\sum\limits_{i=0}^3\left(i \times P(X=i)\right)$, or $0 \times P(X=0) + P(X=1) + 2P(X=2) + 3P(X=3)$. If $X$ is a uniform distribution, $P(X=i) = \frac{1}{4}$ so the value of the expectation is $\frac{1}{4} + 2\left(\frac{1}{4}\right) + 3\left(\frac{1}{4}\right) = \frac{0 + 1 + 2 + 3}{4} = 1.5$.

In general, if the range of $X$ is $\{a_1, a_2, ..., a_n\} \subseteq \mathbb{R}$ then the expectation of $X$ is denoted
$EX = \sum\limits_{i=1}^n \left( a_i \times P(X=a_i)\right)$. This definition works if the range of $X$ is finite.

Consider a roulette wheel with numbers 0 to 36, 18 black, 18 red and 0. If you bet on red or black and are correct you get double your bet, if you're wrong you get nothing and if you get 0 you get half of your bet. This can be represented with a sample space $\Omega = \{r, b, z\}$. $P(r) = P(b) = \frac{18}{37}$ and $P(z) = \frac{1}{37}$. The returns can be modelled with a random variable $R$. Suppose £1 is bet on black. This will give

$$ R = \begin{cases} 0 & \text{if } r \\ 2 & \text{if } b \\ 0.5 & \text{if } z \end{cases} $$

Therefore the expectation, $ER$, for this scenario is $P(r)R(r) + P(b)R(b) + P(z)R(z) = 2\left(\frac{18}{37}\right)+\frac{1}{37}\times\frac{1}{2} = \frac{73}{74} \approx 0.986$, so the return is approximately £0.99.

The expectation of the uniform distribution is $\sum\limits_{k=1}^n\left(k \times \frac{1}{n}\right) = \frac{1}{n} \sum\limits_{k=1}^n k = \frac{1}{n} \times \frac{n(n+1)}{2} = \frac{n+1}{2}$.
The expectation of the Bernoulli distribution is $0 \times (1-p) + 1 \times p = p$.

Expectation of sum and linearity of expectation

Given two random variables, $X$ and $Y$, the expectation of their sum, $E(X+Y)$ is the sum of their expectations, $E(X)+E(Y)$.

Expectation is linear, so $E(X+Y) = E(X) + E(Y)$ and $E(\alpha X) = \alpha E(X)$, $\alpha \in \mathbb{R}$. This is called the linearity of expectation.

Expectation of product

If random variables are independent then $E(X \times Y) = E(X) \times E(Y)$.

Let $X$ be a random variable with binomial distribution. $X = X_1 + X_2 + X_3...$ where $X_i$ is a Bernoulli distribution. Using the linearity of expectation, the expectation of a binomial random variable is $np$, where $n$ is the number of trials and $p$ is the probability of success.

Functions of random variables and variance

The expectation of $g(X)$ (where $X$ is a random variable) is given by $\sum\limits_{k}\left(k \times P(g(X) = k)\right)$ where $k$ is every value given by $X$. This can be rewritten as $E(g(x)) = \sum\limits_a \left(g(a) \times P(X=a) \right)$ where $a$ is every element of the sample space.

The expectation of $X^k$ is called the $k^\text{th}$ moment of $X$.

Let $m = EX$ and $g(x) = (x-m)^2$.
$E(g(X)) = E(X-m)^2 = E(X-EX)^2$, which is the variance of $X$. The variance says how far the value of a random variable is from it's expectation. The square root of the variance gives the standard deviation. Using linearity of expectation it can be shown that the variance is equal to $E(X^2) - (EX)^2$.

Markov's inequality

Let $X$ be a random variable, $X \geq 0$. Markov's inequality states that $\forall a \in \mathbb{R}_{>0}$ $P(X \geq a) \leq \frac{EX}{a}$.

Events and infinite sample spaces

The set of events in a probability space can be a subset of $2^\Omega$.

Computer Science Notes