~Coefficient of Determination~
~Coefficient of determination = r2. This is defined as
the proportion of the total variation made by using the regression line to
predict the observed y-values, instead of predicting the y-values by their
average in each case. (so, that's why it's refered to as the proportion of the
total variation in the observed y-values that is EXPLAINED by the
regression line). The closer r2 is to 1, the more useful the
regression equation is for prediction.
~Technically, r2 = 1 -
[∑(y-yL)2]/[∑(y-yA)2], where
yL is the y-value predicted from the regression line &
yA is the average of the observed
y-values.
~In words,
r2 = [Explained Variation] divided by [Total Variation]
~The
following will give you the motivation for this definition
~Since
y-yA=(yL-yA)+(y-yL), we can square
both sides, to get:
(y-yA)2 =
[(yL-yA) + (y-yL)]2,
or
(y-yA)2 =
[(yL-yA)]2
+2[(yL-yA)][(y-yL)] +
[(y-yL)]2
~Using Sigma (Sum) properties, we
get:
∑[(y-yA)2] =
∑[(yL-yA)]2 +
2∑[(yL-yA)][(y-yL)] +
∑[(y-yL)]2
~The middle term on the right side can
be shown to = 0 in the derivation process of the "least squares line" (using the two equations in a & b).
(see link on Correlation & Regression)
~We then have,
∑[(y-yA)2] = ∑[(yL-yA)]2
+ ∑[(y-yL)]2
which are the expressions that give
us
~Total Variation = Explained Variation + Unexplained Variation,
or
~Explained variation = Total Variaton - Unexplained
variation
~Dividing both sides by the total variation, we
get:
~[Explained Variaton]/[total Variation]=
1-[Unexplained
Variation]/[Total Variation] or r2
(equation for r2
given initially)
~The Unexplained Variation,
∑[(y-yL)]2, can be due to other variables of our problem
not considered.