Chi-square Test for "Goodness of fit"


paper-box




~There are many Chi-square curves, one for each degree of freedom used
(very much like the t-curves).

~The equations for these curves are very advanced & will not be given.
They are continuous & define a continuous distribution.

~They are not symmetric (like normal curves). They rise rapidly and then
approach the horizontal axis as the values increase. (skewed right)

~The key feature is that the area underneath each one is equal to 1.  

~This important property enables us to use them in inferential statistics
(for testing certain types of hypotheses).

~For each curve, the initial value on the horizontal axis is zero and increases from there. The curves are always positive (above the horizontal axis).

~The important areas in question will be to the RIGHT of each Chi-square value.
(these regions will be very important for testing our null hypothesis)

~Since we are testing "goodness of fit", the data sets involved will come from OBSERVED and  EXPECTED data points.

~CONTINGENCY TABLES (data displayed in rows & columns) are usually used for this purpose. Each "cell" (a specific entry in any given row & column) of the table would usually have a frequency or probability for that category.

~The data collected (observed) is usually displayed & the expected results are usually computed (based on probability rules).

~There could be many categories involved.

~Our null hypothesis (for "goodness of fit") will be that there is a "good fit".
(i.e., the observed frequencies do not differ significantly from the expected).

~If it is rejected, then there wasn't sufficient evidence to support it & we
will conclude that observed frequencies differ significantly.

~The level of significance is usually .05 or .01.

~The key test statistic for chi-square is  ∑(O - E)2/ E , where
  O = observed frequencies     &     E = expected frequencies

~The above test statistic is designed so that a small chi-square value gives a large p-value. This will indicate that there is a "good fit" between observed & expected data & the null hypothesis will not be rejected. On the other hand, a large chi-square value will give a small p-value. This will indicate that there is "not a good fit" between observed & expected data & the null hypothesis (good fit), will be rejected.

~The above distribution is discrete, so, for the Chi-square distribution to give a good result, all expected frequencies should be at least 5. (this condition is very similar to the condition required for the normal approximation to the binomial distribution).

~After spending many hours trying to do this on my TI-83,  I had no luck.
Then I ran across a note by the author & he stated that the TI-83 cannot do it unless we make some unusual inputs...it works!

~Here's what you have to do to get that chi-square value & the p-value
associated with it (for rejecting or not rejecting the null hypothesis).

~First, collect your data in two groups, OBSERVED DATA &  EXPECTED DATA.  (these will usually be frequencies or probabilities)

~Your null hypothesis is that there is a very good fit. To test this, do the following:

  1)  Insert a matrix with 2 rows. The number of columns will correspond to
       the categories (say k). Then the number of degrees of freedom will be
       k-1. To do this, click 2nd, matrix, edit, 1[A], enter.
      Enter the size (dimension) of your matrix [A])(i.e., for 2 rows &
       3 columns,2 enter 3 enter)

  2)  Fill in the 1st ROW  (across) with the observed frequencies.

  3)  Fill in the 2nd ROW (across) with the expected frequencies times
       ten to the 30 power.  (i.e., if one of the expected frequencies is 25,
       insert 25E30).  Do this for each of the expected frequencies.

  4)  When both rows are filled properly, click STAT, TESTS, move
        curser down to C (chi-square Test), enter, curser should be
        blinking next to your matrix [A], click enter, do not click enter again
        until you move to Calculate, then click enter.

  5)  You will see the chi-square value, the p-value, & the df value.

  6)  You can now make your decision. If the p-value is less than your
        significance level (.05 or .01), you will reject the null hypothesis.
        (i.e., observed & expected data do not fit well). If the p-value is
        greater than your significance level, you will not reject the null
        hypothesis. (i.e., observed & expected data do fit well).

  7)  The author does not explain why inserting the expected data times
        ten to the 30th power works,  just that it does! (a means to an end)

  8)  You can also get a graph as part of the display, if you move the
        curser to graph & click enter.  It will show you the chi-square curve
        in question, the chi-square test statistic, & the area in question.

~Chi-square is not in the course outline, but is an important type of test for those of you that will be involved in analyzing data from various disciplines.
   

~Example: Historically, 60% of all persons surveyed love chocolate ice cream., 30% do not, and 10% are indifferent. A random sample of 500 people was taken and asked the same 3 questions. The results were:

285 liked, 175 didn't, & 40 were indifferent. Test the claim that these results observed are in line with the historical percentages. Use both .05 and .01 levels of significance.

~Solution:  The expected results would be:  like: (.60)(500)=300,
don't like: (.30)(500)=150, & indifferent: (.10)(500)=50.

Insert your matrix, 2nd, matrix, edit, 1[A], enter.

~Note: Matrix [A] is the only one we use.

Now, enter the size (dimension) of your matrix,
2 enter 3 enter.  Place the observed data points in the first row
& the expected in the 2nd row (times 1030).

                         285                  175                40      
                      300E30            150E30          50E30

Now, click Stat, Tests, (C) chi-square test, enter. Then click enter again with curser next to [A], move curser to calculate, then enter.

You should be viewing the following:

chi-square value test statistic = 6.917
p-value = .0315
df = 2

Conclusion:  since .0315 is less than .05, we reject the "good fit" claim at this level of significance & conclude that the sample results are not in line with historical percentages, however, we do not reject it at the .01 level of significance. For the latter, the observed results do not differ significantly with historical percentages.

~Note: I created this problem for example purposes only & the data has no valid basis.

~Note: see our textbook for more detail.

Chi-square Test for "Independence"




~In more advanced courses, Chi-square tests are also used to test independence of the data sets involved & for making inferences about other statistical measures.

~The null hypothesis in this case is that of independence (i.e., there is no relationship between the data). We will either reject or not reject the null hypothesis of independence based on the chi-square value (similar to the "goodness of fit" technique).

~Just enter the observed values in a table in your TI-83 (follow procedure described in the "goodness of fit" technique)

~The TI-83 will calculate the expected values automatically

~You could also calculate the expected values without the TI-83 for EACH CELL by using

E = (ROW TOTAL)(COLUMN TOTAL) divided by (GRAND TOTAL)

~Then get the chi-square value (same procedure as the "goodness of fit" test)

~Then check the chi-square value in a table for a given number of degrees of freedom & significance level. The degrees of freedom will be the result of the following product: (r-1)(c-1), where r = the # of rows, c = the # of columns. If the chi-square value is greater than that in the table (for a given significance level & degree of freedom), we would reject the null hypothesis of independence.

~Just take the chi-square value to a table & make your decision (go to the table entry for a given degree of freedom & significance level).

~Example:  A group of 306 people were interviewed to determine their opinion concerning a particular current American foreign-policy issue. At the same time their political affilliation was recorded. The data are as follows:

                        Approve       Do Not Approve       No Opinion
                       of Policy            of Policy

Republicans      114                      53                           17

Democrats           87                      27                            8

Do  the data present sufficient evidence to indicate a dependence between party affiliation and the opinion expressed for the sampled population?

Solution:  Follow my procedure for the TI-83.

                Chi-square value:  2.87  (at the .05 level of significance)

                Table value for Chi-square: 5.99

                Conclusion:  Independence