~Coefficient of Determination~


bird


~Coefficient of determination = r2. This is defined as the proportion of the total variation made by using the regression line to predict the observed y-values, instead of predicting the y-values by their average in each case. (so, that's why it's refered to as the proportion of the total variation in the observed y-values that is EXPLAINED by the regression line). The closer r2 is to 1, the more useful the regression equation is for prediction.

~Technically, r2 = 1 - [∑(y-yL)2]/[∑(y-yA)2], where yL is the y-value predicted from the regression line & yA is the average of the observed
y-values.

~In words, r2 = [Explained Variation] divided by [Total Variation]

~The following will give you the motivation for this definition

~Since y-yA=(yL-yA)+(y-yL), we can square both sides, to get:

(y-yA)2 = [(yL-yA) + (y-yL)]2, or

(y-yA)2 = [(yL-yA)]2 +2[(yL-yA)][(y-yL)] + [(y-yL)]2

~Using Sigma (Sum) properties, we get:
∑[(y-yA)2] = ∑[(yL-yA)]2 + 2∑[(yL-yA)][(y-yL)] + ∑[(y-yL)]2

~The middle term on the right side can be shown to = 0 in the derivation process of the "least squares line" (using the two equations in a & b).
(see link on Correlation & Regression)

~We then have, ∑[(y-yA)2] = ∑[(yL-yA)]2 + ∑[(y-yL)]2

which are the expressions that give us

~Total Variation = Explained Variation + Unexplained Variation, or

~Explained variation = Total Variaton - Unexplained variation

~Dividing both sides by the total variation, we get:

~[Explained Variaton]/[total Variation]=

1-[Unexplained Variation]/[Total Variation] or r2
(equation for r2 given initially)

~The Unexplained Variation, ∑[(y-yL)]2, can be due to other variables of our problem not considered.