~Lesson #17-Population Proportions




~Note:  Most statistical studies use these. A proportion represents a percentage of a group (sample or population) having a particular attribute.
(i.e., % of students who love pizza, % of unemployed workers, % of people who drink pepsi, & so on. A proportion could also be given as a probability or fraction).

~Note: In pure mathematics, a proportion is the equality of two fractions.

~Note:  We know (I hope), from previous  units, that the population mean can be estimated by sample means exactly, regardless of the sample size n. However, the relationship between the population standard deviation and the standard deviation of the sample means share a dependency on sample size. (standard deviation of means = the standard deviation of the population / square root of n and , in some cases, the finite population correction, square root of (N-n)/(N-1) enters the picture). (if we are sampling without replacement from a finite population & the sample size exceeds 5% of the population size. However, if we are sampling with replacement (population is considered infinite) or the sample size doesn't exceed 5% of the population size, this correction does not apply.

~Note:  We can then find probabilities (areas) using the standard normal curve falling to the right, left, or between two given scores. Also, we can find z scores corresponding to given probabilities, consequently , finding percentile scores & others corresponding to them. These are done conveniently on your calculator using the 2nd VARS menu, 2 & 3. (you can avoid z scores using the shortcuts)

~Note:  We would like to do the same for proportions (instead of estimating the population mean using the sample means, we would like to estimate the population proportion using the sample proportion.

~Common sense tells us that the sample proportion will not exactly be
be equal to the population proportion (like the means behaved).
(i.e., If we have a sample of 80 Marist College students and find that
65% love pizza, it would be unlikely that 65% of all Marist College students love pizza, exactly).

~So, we need some sort of way to say, at a certain level of confidence, that the percentage of all Marist College students that love pizza falls within a margin of error of 65%.

~If p represents the  population proportion, then, with a margin of error of roughly 10%, we would get a range for p of .55<p<.75 at a 95% level of confidence. The range of values for p is called the confidence interval. Which means, if we were to select many different samples of size 80 and construct the corresponding confidence intervals, 95% of them would actually contain the value of the population proportion p. I will show you how to get it for different levels of confidence & how to compute the margin of error for each case.

~A bit tricky, but not too bad  (thanks to our TI-83).

~Terminology

~Note: p= the population proportion, p cap = the sample proportion, where p cap = x / n, where x have the specific attribute out of a sample size n.

~Ex:  If 684 households are sampled and 570 have a computer, then p
cap= 570 / 684 = .833 = 83.3 %. Here, the specific attribute is "having
a computer". This is also the best point estimate (using one value) for the population proportion p. So, to get a point estimate for p, use
the sample proportion, p cap.

~Note:  We can approximate a binomial distribution quite well using the normal curve, if n p is greater than or =5 and n q  is greater than or =5 (remember, q = 1 - p).(see topics of Interest)
         
~We would like use similar methods  to estimate proportions, however, it will be a little more involved.

~Note:  A Confidence interval is a range or interval of values used to estimate the true value of the population proportion.

~Confidence level:  This is the probability 1- alpha where alpha/2 is
the area under the standard normal curve at both ends.

~The Critical value  Zalpha/2 is the positive z value separating an area of alpha/2 on the right side of the standard normal curve.
         (-Zalpha/2 would correspond to the left side or tail)

           for a 90% confidence level, alpha = .10,   Zalpha/2 = 1.645                  
           for a 95% confidence level, alpha = .05,   Zalpha/2 = 1.96
           for a 99% confidence level, alpha = .01,   Zalpha/2 = 2.576

~
The above confidence levels are the most popular

~Note:  To get these critical values, use 2nd Vars, menu 3, invNorm with .9500, .9750, and .9950 respectively.

~Ex:  Find the critical value Zalpha/2 corresponding to the 95% confidence level.  Use 2nd VARS, menu 3, insert (1 - .05/2)

~Note:  Since the sample proportion p cap , is typically different from the population proportion p, we call this difference the margin of error E.

~To find E, we subtract p cap from the upper limit value of the confidence interval or subtract the lower limit of the confidence interval from p cap (Assuming we can get the confidence interval).

~Here’s an example illustrating how these are found using the TI-83.

~Ex:  Let’s take our study of the sample of 80 Marist College students , of which 65% were found to love pizza. Would this be true of all the students?  Let’s find the 95% confidence interval for the population p (all students).  Here n=80, p cap = .65.

~Use STAT, Tests, down to A on the menu (PropZint), enter
x= (80)(.65) = 52 (number of students  who love pizza in our sample),
(be careful here, since x must be an integer), n =80, .95 (C-level), calculate, enter.

~You should be viewing (.54548, .75452), or ,  .54548<p<.75452  for the 95% confidence interval. Which means, we can confident, 95% of the time, that the true population proportion lies within this range.
Now, for the margin of error E.

~Subtract p cap = .65 from the upper limit value of .75 to get E = .10  (10 % margin of error).

~If we repeated  this procedure for different Confidence levels, we would get the following:

        99%-----(.51264, .78736)-----E=.14
        90%-----(.56229, .73771)-----E= .09
        30%-----(.62945, .67955)-----E= .03   
                    
~Notice that the sample proportion (p cap) is the average of the end points of the confidence interval.

~Notice that the larger the level of confidence, the larger the interval & E

~Note:  Formula for the Margin of Error for the population proportion p
is  E  =  Zalpha/2 times square root of  (p cap)(q cap)/n, where Zalpha/2 is the critical value for a given confidence level, p cap is the sample proportion, q cap= 1- p cap, and n = sample size.

~Note:  Confidence intervals for the population proportion are written as ( p cap - E, p cap + E) or  p cap - E < p < p cap + E


~Determining sample size for a given confidence level

~Substitute E, p cap, q cap, and Zalpha/2 into the formula for E, then solve for n. ( we don’t know p cap before we sample, so, if there is no estimate for the sample proportion, p cap, we use .5 in the formula.

~Note: By using .5 for p cap, this makes the product of p cap & q cap as large as possible thus assuring that our sample size is large enough for the given margin of error & confidence level.

~Note:  As the sample size n increases, the margin of error E, for a given confidence level, decreases.

~Ex:  Suppose we want the margin of error to be 0.03 at a 99% confidence level. What should be are sample size?

  The critical value at the 99% level is 2.58, so  
   E = 2.58 times  square root of (p cap)(1-p cap )/n
                                                                                                  
   Since we don’t know what p cap is until we do the sampling, so it is estimated or, if not, .5 is used.

   So, we choose n so that  (2.58)(.5)/sqr(n) = 0.03.
                                                                                     
   Solving :  sqr(n)= 43,
                    so, n = 1,849.

~Note: an easy way to do this is to just switch E & sqr(n) in the formula, then square both sides to get n.

~Note:If n comes out fractional, we always roundup (i.e., if n comes out to 2775.2, we use 2776)


~Note:The finite population correction is also used when we have a relatively small population N and our sample size is larger in comparison (see the beginning of this discussion).