OLS (ordinary least squares)
Least squares: method of fitting the linear model , or
to a set of training data.
If , then we can pick coefficients to minimize the residual sum of squares (RSS)
In matrix notation:
Differentiating w.r.t. we get the normal equations
If is no-singular, then the unique solution is given by
Regularized LS estimate
The regularized least squares estimate minimizes the following objective:
Where:
- : observed values.
- : input features.
- : coefficients/parameters.
- : regularization parameter (controls the trade-off between fit and penalty).
- : regularization term applied to the coefficients.
Ridge Regression (L2 Regularization):
- Regularization term: .
- Objective:
- Tends to shrink coefficients evenly, but does not force any coefficients to zero.
- Use when you suspect many small/medium-sized effects.
Lasso Regression (L1 Regularization):
- Regularization term: .
- Objective:
- Can drive some coefficients exactly to zero, effectively performing feature selection.
- Use when you expect only a few variables to have significant effects.
WLS (weighted least squares)
A generalization of OLS, used when error variances are not equal (hetereoskedastic).
The difficulty is estimating error variances - they are rarely known exactly. Weights are based on the estimated variances:
Let the weights be
Then the estimator is given by