probability

the language of statistics. See distributions.

convergence

3 types

from strongest to weakest:

almost sure: convergence happens with probability 1
- $Pr (lim_{n \to \infty} X_{n} = X) = 1$
in probability: the probability of being close to the limit increases
- $\forall ϵ > 0, Pr (∣ X_{n} - X ∣ \geq ϵ) \to 0$
in distribution: the distribution of the variable approximates the distribution of the limit
- $F_{X_{n}} (x) \to F_{X} (x)$ for all $x$ where $F_{X}$ is continuous

sequences

If ${X_{n}}$ and ${Y_{n}}$ are sequences of random variables (RVs) such that $X_{n} P X$ and $Y_{n} P Y$ in probability, then:

$X_{n} + Y_{n} P X + Y$
$X_{n} Y_{n} P X Y$

continuous mapping theorem

If $g$ is a continuous function, then:

g (X_{n}) P g (X)

LLN

$\frac{1}{n} \sum_{i = 1}^{n} X_{i} \to E [X]$

strong: almost sure convergence

weak: convergence in probability

CLT

$\frac{n ( X ˉ _{n} - μ )}{σ} d N (0, 1)$

Law of total variance

$Var (X) = E (Var (X ∣ Y)) + Var (E (X ∣ Y))$

This is useful for separating uncertainty into systematic variation (2nd term: variability explained by Y) and random noise (1st term: variability unexplained by Y).

Intuition:

1st term: average variance within groups created by Y
- given Y, what is the residual variance of X?
2nd term: variance between group means
- how much does the expected value of X change as Y varies?

Law of total expectation

E (X) = E (E (X ∣ Y))

For continuous Y:

E [X] = \int_{- \infty}^{\infty} E [X ∣ Y = y] f_{Y} (y) d y

For discrete Y:

E [X] = y_{i} \sum E [X ∣ Y = y_{i}] P (Y = y_{i})

Inequalities

Cauchy-Schwarz

Recall the most common form: $∣ u \cdot v ∣ \leq ∥ u ∥∥ v ∥$ And the integral form:

(\int_{D} f (x) g (x) d μ (x))^{2} \leq (\int_{D} f (x)^{2} d μ (x)) (\int_{D} g (x)^{2} d μ (x))

The probabilistic form is:

E [X Y]^{2} ∣ E (X Y) ∣ \leq E [X^{2}] E [Y^{2}] \leq E (X^{2}) E (Y^{2})

Upper bound: this becomes an inequality f there is perfect correlation, i.e. $Y = c X$ .

Markov

for $a > 0$

P (X \geq a) \leq \frac{E ∣ X ∣}{a}

Use case: when you only know the mean and need an upper bound on how often a random variable gets large

Chebyshev

for $E (X) = μ, V (X) = σ^{2}$ ,

P (∣ X - μ ∣ \geq a) \leq \frac{σ ^{2}}{a ^{2}}

Use case: when you know the mean and variance and need an upper bound on the spread of the distribution

Jensen

$E (g (X)) \geq g (E (X))$ for $g$ convex; reverse if $g$ is concave

Summer

Table of Contents

probability

convergence

3 types

sequences

continuous mapping theorem

LLN

CLT

Law of total variance

Law of total expectation

Inequalities

Cauchy-Schwarz

Markov

Chebyshev

Jensen

Graph View

Backlinks

Summer

Table of Contents

probability

convergence §

3 types §

sequences §

continuous mapping theorem §

LLN §

CLT §

Law of total variance §

Law of total expectation §

Inequalities §

Cauchy-Schwarz §

Markov §

Chebyshev §

Jensen §

Graph View

Backlinks

convergence

3 types

sequences

continuous mapping theorem

LLN

CLT

Law of total variance

Law of total expectation

Inequalities

Cauchy-Schwarz

Markov

Chebyshev

Jensen