see also: least squares

assumptions

  • Linearity:
  • Independence:
  • Normality of error terms:
  • Equal variance (homoscedasticity):
  • No perfect multicollinearity:

types

  • simple regression: 1 predictor, 1 response variable
  • multiple regression: multiple predictors, 1 response variable
  • multivariate regression: multiple predictors, multiple response variables

simple regression

Residuals are predicted values of error

The least squares estimators are BLUE: Best Linear Unbiased Estimators.

The fitted regression line is given by

It always passes through .

ANOVA (analysis of variance)

  • sum of squares
    • SST (total): total variability in the response.
    • SSR (regression): variability explained by the model.
    • SSE (error): variability unexplained by the model.
      • quantity to minimize!
  • mean squares
    • MSR (regression): average variability explained by the model.
    • MSE (error): average variability unexplained by the model.
  • F-statistic: tests overall significance of the model.

evaluation

  • coefficient of determination, or R-squared: proportion of variance in explained by .
    • in simple regression,
  • adjusted R-squared: adjusts for the number of predictors.

diagnostics

  • Residual Analysis: Check residual plots for patterns (indicating violations of assumptions).
  • Multicollinearity: High correlation among predictors.
    • Detected via Variance Inflation Factor (VIF).
  • Heteroscedasticity: Non-constant variance of residuals.
    • Detected via residual plots; corrected by transformation or weighted least squares.
  • Autocorrelation: Correlation of residuals with themselves over time.
    • Detected via Durbin-Watson test.

extension

Polynomial Regression: Model non-linear relationships. Interaction Terms: Capture combined effects of predictors.

Ridge Regression: Adds penalty.

Lasso Regression: Adds $\lambda \sum_{j=1}^p |\beta_j| penalty.