Chi-square Test for "Goodness of fit"

~There are many Chi-square curves, one for each degree of
freedom used
(very much like the t-curves).
~The equations for these
curves are very advanced & will not be given.
They are continuous &
define a continuous distribution.
~They are not symmetric (like normal
curves). They rise rapidly and then
approach the horizontal axis as the
values increase. (skewed right)
~The key feature is that the area
underneath each one is equal to 1.
~This important property
enables us to use them in inferential statistics
(for testing certain types
of hypotheses).
~For each curve, the initial value on the horizontal axis
is zero and increases from there. The curves are always positive (above the
horizontal axis).
~The important areas in question will be to the RIGHT
of each Chi-square value.
(these regions will be very important for testing
our null hypothesis)
~Since we are testing "goodness of fit", the data
sets involved will come from OBSERVED and EXPECTED data points.
~CONTINGENCY TABLES (data displayed in rows & columns) are usually used for this purpose.
Each "cell" (a specific entry in any given row & column) of the table would usually
have a frequency or probability for that category.
~The data collected
(observed) is usually displayed & the expected results are usually computed (based
on probability rules).
~There
could be many categories involved.
~Our null hypothesis (for "goodness of fit") will be that there is a
"good fit".
(i.e., the observed frequencies do not differ significantly from
the expected).
~If it is rejected, then there wasn't sufficient evidence
to support it & we
will conclude that observed frequencies differ
significantly.
~The level of significance is usually .05 or
.01.
~The key test statistic for chi-square is ∑(O -
E)2/ E , where
O = observed frequencies
& E = expected
frequencies
~The above test statistic is designed so that a small
chi-square value gives a large p-value. This will indicate that there is
a "good fit" between observed & expected data & the null hypothesis
will not be rejected. On the other hand, a large chi-square value will give
a small p-value. This will indicate that there is "not a good fit" between
observed & expected data & the null hypothesis (good fit), will be rejected.
~The above distribution is discrete, so, for the Chi-square
distribution to give a good result, all expected frequencies should be at least
5. (this condition is very similar to the condition required for the normal
approximation to the binomial distribution).
~After spending many hours
trying to do this on my TI-83, I had no luck.
Then I ran across a note
by the author & he stated that the TI-83 cannot do it unless we make some
unusual inputs...it works!
~Here's what you have to do to get that
chi-square value & the p-value
associated with it (for rejecting or not
rejecting the null hypothesis).
~First, collect your data in two groups,
OBSERVED DATA & EXPECTED DATA. (these will usually be
frequencies or probabilities)
~Your null hypothesis is that there is a very good fit. To
test this, do the following:
1) Insert a matrix with 2
rows. The number of columns will correspond
to
the categories (say k). Then the number of degrees of
freedom will be
k-1. To do this, click 2nd, matrix, edit, 1[A], enter.
Enter the size (dimension)
of your matrix [A])(i.e., for 2 rows &
3 columns,2 enter 3 enter)
2) Fill in the 1st ROW (across) with the
observed frequencies.
3) Fill in the 2nd ROW (across)
with the expected frequencies
times
ten to the 30 power.
(i.e., if one of the expected frequencies is
25,
insert 25E30). Do this
for each of the expected frequencies.
4) When both rows
are filled properly, click STAT, TESTS,
move
curser down to C
(chi-square Test), enter, curser should be
blinking next to your matrix
[A], click enter, do not click enter again
until you move to Calculate,
then click enter.
5) You will see the chi-square value,
the p-value, & the df value.
6) You can now make
your decision. If the p-value is less than
your
significance level (.05
or .01), you will reject the null
hypothesis.
(i.e., observed
& expected data do not fit well). If the p-value is
greater than your
significance level, you will not reject the null
hypothesis. (i.e., observed
& expected data do fit well).
7) The author does
not explain why inserting the expected data
times
ten to the 30th power
works, just that it does! (a means to an end)
8)
You can also get a graph as part of the display, if you move the
curser to graph & click
enter. It will show you the chi-square curve
in question, the chi-square
test statistic, & the area in question.
~Chi-square is not in the
course outline, but is an important type of test for those of you that will be
involved in analyzing data from various
disciplines.
~Example: Historically, 60% of all
persons surveyed love chocolate ice cream., 30% do not, and 10% are indifferent.
A random sample of 500 people was taken and asked the same 3 questions. The
results were:
285 liked, 175 didn't, & 40 were indifferent. Test the
claim that these results observed are in line with the historical percentages.
Use both .05 and .01 levels of significance.
~Solution: The
expected results would be: like: (.60)(500)=300,
don't like:
(.30)(500)=150, & indifferent: (.10)(500)=50.
Insert your matrix,
2nd, matrix, edit, 1[A], enter.
~Note: Matrix [A] is the only one we use.
Now, enter the size (dimension) of your matrix,
2 enter 3 enter.
Place the observed data points in the first
row & the expected in the 2nd row (times
1030).
285
175
40
300E30
150E30
50E30
Now, click Stat,
Tests, (C) chi-square test, enter. Then click enter again with curser next to
[A], move curser to calculate, then enter.
You should be viewing the
following:
chi-square value test statistic = 6.917
p-value =
.0315
df = 2
Conclusion: since .0315 is less than .05, we reject
the "good fit" claim at this level of significance & conclude that the
sample results are not in line with historical percentages, however, we do not
reject it at the .01 level of significance. For the latter, the observed results
do not differ significantly with historical percentages.
~Note: I
created this problem for example purposes only & the data has no valid
basis.
~Note: see our textbook for more detail.
Chi-square Test for "Independence"
~Example: A group of 306 people were interviewed to determine their opinion concerning a particular current American foreign-policy issue. At the same time their political affilliation was recorded. The data are as follows:
Approve
Do Not Approve No Opinion
of Policy of
Policy
Republicans 114 53 17
Democrats 87 27 8
Do the data present sufficient evidence to indicate a dependence between party affiliation and the opinion expressed for the sampled population?
Solution: Follow my procedure for the TI-83.
Chi-square value: 2.87 (at the .05 level of significance)
Table value for Chi-square: 5.99
Conclusion: Independence