~Class Meeting #19-Estimating an unknown Population mean
(s unknown)
~Note: Very realistic and often used.
~Note: Assumptions: 1) the sample is from a normally distributed population or n is 30 or greater. (either one or both)
~Note: Since s is not known, we will use the sample standard deviation to estimate it.
~Note: Using the sample standard deviation to estimate s creates more variability for small samples sizes, so the standard normal curve is not used. So, we no longer are using z scores.
~Note: Instead, we use curves that look very much like the standard normal curve, but are less peaked (kurtosis) and have larger standard deviations (due to the small sample sizes). These are called t-curves and will yield t-values for each of them.
~There are many t-curves. One for each sample size n These are referred to as t-distributions or the student’s t-distribution.
~Note: Each t curve is determined by the degree of freedom for that curve. The degree of freedom is determined by the sample size, it equals n-1. The larger the sample size, the larger the degree of freedom, and the closer the t curve is to the standard normal curve.
(df means degree of freedom) (sample size minus one).
~Note: We work with t curves exactly like we did with the standard normal curve, since the areas under each t curve also equals one.
~However, you must select the right curve by indicating the degrees of freedom (n-1).
~Note: Many statisticians do not use t curves when s is unknown and n is 30 or greater, since with relatively large sample sizes, sample standard deviation and s are very close.
~However, our textbook always uses t curves with an unknown s regardless of sample size, so all my testing will be based on that fact.
~Studies have shown to yield better results using t distributions for cases when s is unknown.
~Note: So when using t curves, Zalpha/2 becomes talpha/2 and E = (talpha/2)[sample standard deviation/sqr(n)]. There is a table that gives the critical t values for given levels of confidence (1-alpha).
~Note: If you go to that table, you will find the critical values at the 95% C.L. Are t=2.04, 2.02, 2.00, 1.98 for df = 30, 40, 60, & 120 respectively. Remember C.L.= 1- alpha.
~Note: To get a t value, use the table with the indicated degrees of freedom (n-1).
~Note: To get a confidence interval using t-values, use STAT, Tests, Tinter(8), enter Data or Stats, C-Level, Calculate.
~Example: A random sample of size 10 from a normal population results in a mean of 124 and sample variance of 21. Find an approx. 90% confidence interval for the population mean and the margin of error. (note that the population standard deviation is unknown, so use the t distribution (t curves).
~Calculator: STAT, Tests, Tinterval, stats, enter sample mean 124, enter sample standard deviation sqr(21), enter n=10, enter C-Level .90, Calculate, hit enter, gives (121.34, 126.66). So, E = 124-121.34 =2.66.
~Example: The gestation period of domestic dogs is normally distributed. To estimate the mean gestation period, 15 randomly selected dogs are observed during pregnancy. Their gestation periods (days) were 62.0, 61.4, 59.8, 62.2, 60.3, 60.4, 59.4, 60.2, 60.4, 60.8, 61.8, 59.2, 61.1, 60.4, 60.9.
Find a 95% confidence interval for the true mean gestation period, m, of the domestic dog.
~Clear L1, STAT, edit, enter, enter data in L1, STAT, Tests, Tinter (down to 8), enter, Data, enter, L1, freq 1, C-Level .95, Calculate, enter, gives (60.19, 61.18), E = sample mean - 60.19= 60.69-60.19= .5.
~Therefore, we can be 95% confident that the mean gestation period, m, of the domestic dog, is somewhere between 60.19 and 61.18 days.
~Summary:
1) n ≥30: No assumptions made on population & s is known,use the Z distribution
2) n < 30: Population is normally distributed & s is known, use the Z distribution
3) n ≥30: No assumptions made on population & s is not known, use the T distribution
4) n < 30: Population is normally distributed & s is not known, use the T distribution
5) n < 30: Population is skewed: uses methods not covered in this course.