Linear Regression#

Linear Regression is a technique for leveraging the linear correlation that exists between two variables; It uses the information available in one variable to help predict the value of the other variable. The variable used to predict is called the predictor variable and is typically denoted x_i. The variable whose value is being predicted is called the response variable and is typically denoted y_i.

The objective to Linear Regression is to fit a model \hat{y} that describes the dependence the y_i variable has on the x_i. The model \hat{y} will be used to make predictions for given values of x_i. The predicted value of y_i given x_i will be denoted \hat{y_i}.

Note

\hat{y} is a linear function. \hat{y_i} is a point on the curve on \hat{y}. (If one were committed to the purity of symbols, \hat{y_i} would represent the y-value of a point; but implicit in the subscript notation is its mapping to a corresponding x_i value.)

In order to find a good model, the concept of model error will be concentrated in the definition of a residual,

\varepsilon_i = y_i - \hat{y_i}

This quantity will provide a metric for validating models against observed data.

TODO

Regression Model#

TODO

The Linear Regression Model is specified by two equations. The first equation parameterizes the predicted value of y_i given x_i, \hat{y_i}. The second equation describes the distribution of error terms as the difference of actual values and predicted values.

The term \varepsilon_i is a normally distributed error term centered around 0 with a variance equal to the mean squared error of the model,

\varepsilon_i \sim \mathcal{N}(0, \sqrt{\text{MSE}})

TODO

Mean Squared Error#

The term \hat{y} is not the observed value of y in the bivariate sample of data that was used to calibrate the model. It is the predicted value of y given the observed value of x. This is an extremely important point when talking about regression. The model equation is a prediction, and the prediction is not exact. Each predicted value of y, \hat{y}, will deviate from the observed value of y. The deviation, if the model is a good fit, should be normally distributed around 0.

TODO

Sum Squared Error#

TODO

\text{SSE} = \sum_{i=1}^{n} (\hat{y}_i - y_i)^2

TODO

MSE: Mean Squared Error#

TODO

\text{MSE} = \frac{\sum_{i=1}^n (\hat{y}_i - y_i)^2}{n-2}

TODO

TODO: degrees of freedom, two parameters in regression model, etc

Model Estimation#

Model-fitting in the context of Linear Regression can be understood as the task of finding the values of the model coefficients, \mathcal{B}_0 and \mathcal{B}_1, most appropriate to use in the Regression Equation, \hat{y}.

Least Squares Estimation#

One of the most common and easily understood methods for estimating the value of the model coefficients is known as Least Squares Estimation. The reason for the name Least Squares will shortly be explained. In short, with this method, the Regression Model is estimated by finding the values of \mathcal{B}_0 and \mathcal{B}_1 that minimize the MSE of the model.

The formulae that result from the application of this process are given directly in the following cards for reference. The logic and derivation of these formulae are the the topics of discussion in the next section.

TODO

TODO

Assessing Model Fit#

Regression is a not a one-stop shop; it is important to bear in mind the limitations of Regression. If the model assumptions are not met Residual Analysis —————–

TODO: distribution of residuals, normality assumption

Error Reduction#

TODO

TODO

TODO

TODO

Coefficient of Determination#

TODO

TODO