Correlation#

Correlation is a measure of the strength of a relationship that exists between two observable variables.

Introduction#

Preliminaries#

Before we can begin our study of correlation, let’s make some preliminary defintions that will help us keep everything clear and precise.

Univariate Statistics#

In order to differentiate between the statistics relationing to the x and y variables, we introduce some notation.

\bar{x} and \bar{y} are defined as the univariate sample means of the x and y variables. In other words, \bar{y} is the sample mean of the y variable, as if we were observing the y variable in isolation. Similarly for \bar{x}.

s_x and s_y are defined as the univariate standard deviations of the x and y variables. In other words, s_x is the standard deviation of the x variable, as if we were observing the x variable in isolation. Similarly, for s_y.

s_{x}^2 = \frac{1}{n-1} \cdot \sum_{i=1}^{n} (x_i - \bar{x})^2

s_{y}^2 = \frac{1}{n-1} \cdot \sum_{i=1}^{n} (y_i - \bar{y})^2

Assessing Correlation#

TODO

(Source code, png, hires.png, pdf)

../../_images/scatterplot_positive_correlation.png

TODO

(Source code, png, hires.png, pdf)

../../_images/scatterplot_negative_correlation.png

TODO

(Source code, png, hires.png, pdf)

../../_images/scatterplot_no_correlation.png

TODO

s_x and s_y are defined as the univariate standard deviations of the x and y variables. In other words, s_x is the standard deviation of the x variable, as if we were observing only x alone. Similarly, for s_y.

s_{x}^2 = \frac{1}{n-1} \cdot \sum_{i=1}^{n} (x_i - \bar{x})^2

s_{y}^2 = \frac{1}{n-1} \cdot \sum{i=1}^{n} (y_i - \bar{y})^2

Definition#

Version 1#

TODO: justification. make some plots.

Version 2#

TODO: shortcut for version 2

Version 3#

TODO: justifcation, again.

r_{xy} = \frac{1}{n-1} \cdot \sum_{i=1}^{n} (\frac{x_i - \bar{x}}{s_x}) \cdot (\frac{y_i - \bar{y}}{s_y})