Discrete Distributions
Bernoulli Distribution
- Models: Single binary outcome (success/failure).
- Example: Flipping a coin.
- PMF:
Binomial Distribution
X~Bin(n, p) = sum of i.i.d. Bern(p) RVs
- Models: Number of successes in independent Bernoulli trials.
- Example: Number of heads in coin flips.
- PMF:
Multinomial Distribution
- Models: Generalization of the binomial distribution for more than two outcomes.
- Example: Rolling a die 10 times and counting the occurrences of each face.
- PMF:
where and .
Hypergeometric Distribution
- Models: Number of successes in a n draws (without replacement)
- Example: Drawing 5 cards from a deck and counting the number of aces.
- PMF:
where is the population size, is the number of successes in the population, and is the sample size.
Negative hypergeometric distribution
- Models: Number of draws (without replacement) to achieve r successes
- Example: Number of cards that must be drawn to collect 4 aces.
- PMF:
Geometric Distribution
- Models: Number of trials until the first success.
- Example: Number of flips until first heads.
- PMF:
Negative Binomial Distribution
X~NBin(r, p) = sum of i.i.d. Geom(p) RVs
- Models: Number of trials needed to achieve k successes (inclusive of the k-th trial).
- Example: Number of coin flips required to get 3 heads.
- PMF:
where is the number of successes.
Poisson Distribution
models the #(events that occur in a unit of space or time). is the expected number of events.
Derivation
Suppose you have n trials, each with of success. Then the probability of r successes can be modelled by the binomial distribution:
As n tends to infinity, we have
This gives rise to the Poisson pmf:
Discrete Uniform Distribution
- Models: All outcomes in a finite set are equally likely.
- Example: Rolling a fair die.
- PMF:
Continuous Distributions
Continuous Uniform Distribution
Exponential Distribution
- Models: Time between events in a Poisson process.
- Example: Time between incoming calls.
- PDF:
Gamma Distribution
X~Gamma() models the amount of time until n events. E.g. Time until the earthquake.
- Gamma(n, λ) = sum of i.i.d. Expo(λ)
- Gamma(1, λ) ∼ Expo(λ)
Shape-Rate Parameterization: the preferred parameterization for Bayesian stats
where is the Gamma function
Shape-Scale Parameterization: models the waiting time until the th event when each event occurs on average every units of time.
Weibull Distribution
- Models: Lifetimes of objects.
- Example: Time to failure of a machine.
- PDF:
Pareto distribution
- Models: Heavy-tailed distributions, often used to model situations where a small number of occurrences account for the majority of the effect.
- Example: The distribution of wealth in a population, where a small percentage of people hold most of the wealth.
where is the scale parameter (minimum value) and is the shape parameter.
Normal (Gaussian) Distribution
Bivariate Normal Distribution
Multivariate Normal Distribution
where is a k-dimensional vector, is the mean vector, and is the covariance matrix.
Log-Normal Distribution
- Models: Multiplicative processes.
- Example: Stock prices.
- PDF:
Chi-Square Distribution
- Models: Sum of squares of normal variables.
- Example: Goodness-of-fit tests.
- PDF:
F-Distribution
- Models: Ratio of two scaled chi-square distributions.
- Example: ANOVA testing.
- PDF:
where and are the degrees of freedom.
For :
For :
The variance is undefined for .
Beta Distribution
- Models: Distribution of probabilities.
- Example: Distribution of success rates.
- PDF:
where or
Dirichlet Distribution
- Models: Probabilities of outcomes in a multinomial distribution.
- Example: Proportion of time spent on different activities during a day.
- PDF:
where is the multinomial Beta function.where .
t-Distribution
- Models: Distribution of sample means when population variance is unknown.
- Example: Testing hypotheses about means.
- PDF:
where is the degrees of freedom.
For :
For :
The variance is undefined for .
Cauchy Distribution
- Models: Distributions with heavy tails.
- Example: Resonance behavior.
- PDF:
statistical distance
measures how different 2 probability distributions P and Q are from each other.
- asymmetric measure:
- Kullback-Leibler Divergence:
- MLE can be seen as minimizing the KL divergence
- Kullback-Leibler Divergence:
- symmetric measures:
- total variation difference:
- Hellinger distance: