see also: least squares
assumptions
- Linearity:
- Independence:
- Normality of error terms:
- Equal variance (homoscedasticity):
- No perfect multicollinearity:
types
- simple regression: 1 predictor, 1 response variable
- multiple regression: multiple predictors, 1 response variable
- multivariate regression: multiple predictors, multiple response variables
simple regression
Residuals are predicted values of error
The least squares estimators are BLUE: Best Linear Unbiased Estimators.
The fitted regression line is given by
It always passes through .
ANOVA (analysis of variance)
- sum of squares
- SST (total): total variability in the response.
- SSR (regression): variability explained by the model.
- SSE (error): variability unexplained by the model.
- quantity to minimize!
- SST (total): total variability in the response.
- mean squares
- MSR (regression): average variability explained by the model.
- MSE (error): average variability unexplained by the model.
- MSR (regression): average variability explained by the model.
- F-statistic: tests overall significance of the model.
evaluation
- coefficient of determination, or R-squared: proportion of variance in explained by .
- in simple regression,
- adjusted R-squared: adjusts for the number of predictors.
diagnostics
- Residual Analysis: Check residual plots for patterns (indicating violations of assumptions).
- Multicollinearity: High correlation among predictors.
- Detected via Variance Inflation Factor (VIF).
- Heteroscedasticity: Non-constant variance of residuals.
- Detected via residual plots; corrected by transformation or weighted least squares.
- Autocorrelation: Correlation of residuals with themselves over time.
- Detected via Durbin-Watson test.
extension
Polynomial Regression: Model non-linear relationships. Interaction Terms: Capture combined effects of predictors.
Ridge Regression: Adds penalty.
Lasso Regression: Adds $\lambda \sum_{j=1}^p |\beta_j| penalty.