Class Meeting #1 - Terminology & definitions
~Note: Not all agree on everything. Personal views are often expressed.
~Descriptive Statistics: Methods dealing with
organizing and summarizing data (information) using tables, charts,
graphs, & so on. Also, measures are calculated which gives a
description of the data
(i.e., averages, ranges, percentiles, & so on).
~Inferential Statistics: (becoming much more popular
lately). Drawing conclusions about a population (the complete set of
all subjects or elements to be studied) from information obtained from
a sample (a sub collection of the population). (i.e., Hypothesis
testing)
~Note: Populations can be special groups of people, salaries of
groups, heights or weights of objects, chemical elements, & so on.
In most cases, they are very large & finite, so it is impractical to collect data from all (population of the US).
~Note: A population could be infinite (i.e.,integers).
~Example: We may wish to draw a conclusion about the
fairness of a particular coin by tossing it repeatedly. The population
consists of the results of an infinite number of tosses. A sample could
be obtained by examining the first 50 tosses and noting the
percentages of heads and tails.
~Data: Information collected (or data set). Each piece of
data is called a data point. (for the above, we would list the number
of heads & tails observed in 50 tosses).
~Discrete Data: Finite or infinite (countable) (i.e.,
{1,2,3, ...}) Items are sorted into separate, or discrete
classes. (i.e., number of heads or tails, yes or no or sometimes,
order of finish in a given race, principal languages of the world,
& so on).
~For Discrete data, there are "gaps" or "spaces" between data
points that cannot be assumed. For example, if we toss a coin &
let 0 be a tail & 1 be a head, values between 0 & 1 make no sense,
i.e, 0.57 would make no sense. So, this value cannot be assumed.
~Note: All values between data points can not be taken on (there is no way of finishing 2.457 in a race)
~Continuous Data: All possible values could be assumed between
any 2 data points. (i.e., numbers between 0 and 2, range of
temperatures during a day, height, weight, time, & so on). (a
measurement scale with no gaps)(values between 2 data points are
possible data points themselves)(involves measurement on a continuous
scale)(an infinite non-countable scale).
~Example: Cans of soda purchased would be discrete while volume of soda ingested would be continuous (oz).
~ Continuous means "no breaks", so data that comes
from this type has
values assumed between any 2 data points we
collect. For example, if we
sample heights, & we have 5' 4" & 5' 9"
for data pts, all heights between these can be assumed (we might not have
them in our sample, however). The value 5' 6.69" or any other value between
the ones given would make sense, even though we do not have it in our
sample.
~Qualitative Data: Gives non-numerical information such as
gender, eye color, and blood type. Also, elements in a sample of cat
food. (does not measure amount) (descriptive in nature).
(could involve numbers-phone numbers, addresses, etc.)(usually
displayed by bar graphs).
~Quantitative Data: Gives measured amounts of quantities such as weight, height, walking speed, time, & amounts of elements in cat food.
~Note: Usually displayed by histograms (will discuss). Numerical amounts are described
~Note: The following are sub classes of data that could be
either qualitative or quantitative. They also indicate levels of
measurement.
~Nominal: (consists of names, gender, labels, or
categories only). Data that cannot be arranged in any ordering scheme.
(usually discrete data)
~Ordinal: Data concerned with order or rank. (race
results, letter grades in a course, college football rankings, & so
on). The difference in data values is not measured. (usually discrete
data).
~Interval: (Metric data). Obtained from the measurement of
quantities such as temperature & time. (usually continuous data).
Uses a constant scale with differences having meaning but ratios do not
(since there is no natural zero). (absence of the quantity). For
example, the difference of 80 degrees & 40 degrees can be measured
but you can not conclude that the ratio of 80/40=2 means it's twice as
hot.
~Ratio: Also metric data & uses a continuous, constant
scale. Both differences & ratios can be measured. There is a
natural zero (absence of the quantity). Examples would be the
measurements of height & weight.
~Frequency Data: (also called count data).
Counting the number of times an item falls into a certain category.
(i.e., number of males/females, number of A’s, B’s,
C’s in a course, & so on)(this is how I will email your results on your quizzes & exams) (could be either discrete or continuous)
~Parameter: A descriptive measure for the entire population (usually Greek symbols used)
[lower cases of Mu (m) & Sigma (s)] (mean & standard deviation of a population)
~Note: In mathematics, a parameter has a different
meaning. (a third quantity that dictates the position of a particle,
such as an angle or a time value) (i.e., parametric equations)
~Statistic: A descriptive measure based entirely on a sample of a population (usually Latin letters are used).
~Random Sampling: (also called a simple random sample).
A procedure in which each possible sample of a given size (n) is
equally likely to be selected. (equal probabilities of being selected).
(probabilities play an extremely important part in statistics) (later)
~Random Sample: A sample obtained by random sampling.
~Note: When sampling a population, we would like to
eliminate all biases, so random sampling is our goal & is extremely
important in order to draw valid conclusions about the population
studied. (See link on BIASED SAMPLE under Topics of Interest)(random sampling is used though out this course)
~Note: Also, you must make sure that your sample is
representative of the population considered. (to draw a conclusion
about the average price of a home in California, don't restrict your
sample to Beverly Hills).
~Example: There are 60 students in a given class, 10
students in each of 6 rows. You would like a random sample of 10
students from the class. You number the rows 1-6 then roll a die. What
ever comes up (say 5), you choose those 10 from that row. Is this a
random sample?
Is this a simple random sample?
~Other Sampling Methods: (4 others that you should know).
Some of which involve a selection of a random sample as a sub method.
~Systematic: Selecting every k th element of a population
(i.e., phone book, on campus, at the mall, & so on) (i.e., every 10
th student entering the student center)
~Convenience: Very easy to get (lazy man's way). No set pattern. (i.e., simple phone call, asking a question to anyone, & so on).
~Stratified: Separating or subdividing a population into
subgroups that share a common trait. (i.e., age, gender, smokers,
income levels, & so on).
~Cluster: Separating the population into sections, then
selecting some sections, then taking elements from that section. (i.e.,
towns in Dutchess County, counties in NY, voting precincts, & so
on).
~Sampling Error: A sample from a population provides us
with only a small portion of the entire population data. So, we don't
expect the sample to yield perfectly accurate information about the
population. Thus, we should anticipate a certain amount of error. Many
things can go wrong (from the way we collect the sample to errors in
computing the statistics we seek & the inferences we make).
~Note: You should be aware that using different statistics
with different definitions can yield a conclusion quite different from
what people think. (i.e., driving is safer than flying).