OLS (ordinary least squares)

Least squares: method of fitting the linear model , or

to a set of training data.

If , then we can pick coefficients to minimize the residual sum of squares (RSS)

In matrix notation:

Differentiating w.r.t. we get the normal equations

If is no-singular, then the unique solution is given by

Regularized LS estimate

The regularized least squares estimate minimizes the following objective:

Where:

  • : observed values.
  • : input features.
  • : coefficients/parameters.
  • : regularization parameter (controls the trade-off between fit and penalty).
  • : regularization term applied to the coefficients.

Ridge Regression (L2 Regularization):

  • Regularization term: .
  • Objective:
  • Tends to shrink coefficients evenly, but does not force any coefficients to zero.
  • Use when you suspect many small/medium-sized effects.

Lasso Regression (L1 Regularization):

  • Regularization term: .
  • Objective:
  • Can drive some coefficients exactly to zero, effectively performing feature selection.
  • Use when you expect only a few variables to have significant effects.

WLS (weighted least squares)

A generalization of OLS, used when error variances are not equal (hetereoskedastic).

The difficulty is estimating error variances - they are rarely known exactly. Weights are based on the estimated variances:

Let the weights be

Then the estimator is given by