The likelihood function is defined as the probability of the observed data given parameters :
Note: we can think of probabilities and likelihoods as duals. They reflect the relationship between data and parameters from opposite perspectives:
- Probability is used for prediction and inference about data.
- Likelihood is used for estimation and inference about parameters.
log likelihood
If the data are i.i.d., the likelihood function is often the product of the probabilities for each data point.
It’s often easier to work with the log likelihood function as it simplifies the multiplication into addition:
score
The score function is the derivative of the log-likelihood w.r.t.
Solving gives the maximum likelihood estimator of .
At the true value of the parameter, the expected value of the score is 0: