Class Meeting #1 - Terminology & definitions

~Note:   Not all agree on everything. Personal views are often expressed.

~Descriptive Statistics:   Methods dealing with organizing and summarizing data (information) using tables, charts, graphs, & so on. Also, measures are calculated which gives a description of the data
(i.e., averages, ranges, percentiles, & so on).

~Inferential Statistics:   (becoming much more popular lately). Drawing conclusions about a population (the complete set of all subjects or elements to be studied) from information obtained from a sample (a sub collection of the population). (i.e., Hypothesis testing)

~Note: Populations can be special groups of people, salaries of groups, heights or weights of objects, chemical elements, & so on.
In most cases, they are very large & finite, so it is impractical to collect data from all (population of the US).

~Note: A population could be infinite (i.e.,integers).

~Example: We may wish to draw a conclusion about the fairness of a particular coin by tossing it repeatedly. The population consists of the results of an infinite number of tosses. A sample could be obtained by examining the first 50 tosses and noting the percentages of heads and tails.

~Data: Information collected (or data set). Each piece of data is called a data point. (for the above, we would list the number of heads & tails observed in 50 tosses).

~Discrete Data: Finite or infinite (countable) (i.e., {1,2,3, ...}) Items are sorted into separate, or discrete classes. (i.e., number of heads or tails, yes or no or sometimes, order of finish in a given race, principal languages of the world, & so on).

~For Discrete data, there are "gaps" or "spaces" between data points that cannot be assumed. For example, if we toss a coin & let 0 be a tail & 1 be a head, values between 0 & 1 make no sense, i.e, 0.57 would make no sense. So, this value cannot be assumed.

~Note: All values between data points can not be taken on (there is no way of finishing 2.457 in a race)

~Continuous Data: All possible values could be assumed between any 2 data points. (i.e., numbers between 0 and 2, range of temperatures during a day, height, weight, time, & so on). (a measurement scale with no gaps)(values between 2 data points are possible data points themselves)(involves measurement on a continuous scale)(an infinite non-countable scale).

~Example: Cans of soda purchased would be discrete while volume of soda ingested would be continuous (oz).

~ Continuous means "no breaks", so data that comes from this type has
values assumed between any 2 data points we collect. For example, if we
sample heights, & we have 5' 4" & 5' 9" for data pts, all heights between these can be assumed (we might not have them in our sample, however). The value 5' 6.69" or any other value between the ones given would make sense, even though we do not have it in our sample.

~Qualitative Data: Gives non-numerical information such as gender, eye color, and blood type. Also, elements in a sample of cat food. (does not measure amount) (descriptive in nature).
(could involve numbers-phone numbers, addresses, etc.)(usually
displayed by bar graphs).

~Quantitative Data: Gives measured amounts of quantities such as weight, height, walking speed, time, & amounts of elements in cat food.

~Note: Usually displayed by histograms (will discuss). Numerical amounts are described

~Note: The following are sub classes of data that could be either qualitative or quantitative. They also indicate levels of measurement.

~Nominal: (consists of names, gender, labels, or categories only). Data that cannot be arranged in any ordering scheme. (usually discrete data)

~Ordinal: Data concerned with order or rank. (race results, letter grades in a course, college football rankings, & so on). The difference in data values is not measured. (usually discrete data).

~Interval: (Metric data). Obtained from the measurement of quantities such as temperature & time. (usually continuous data). Uses a constant scale with differences having meaning but ratios do not (since there is no natural zero). (absence of the quantity). For example, the difference of 80 degrees & 40 degrees can be measured but you can not conclude that the ratio of 80/40=2 means it's twice as hot.

~Ratio: Also metric data & uses a continuous, constant scale. Both differences & ratios can be measured. There is a natural zero (absence of the quantity). Examples would be the measurements of height & weight.

~Frequency Data: (also called count data). Counting the number of times an item falls into a certain category. (i.e., number of males/females, number of A’s, B’s, C’s in a course, & so on)(this is how I will email your results on your quizzes & exams) (could be either discrete or continuous)

~Parameter: A descriptive measure for the entire population (usually Greek symbols used)
[lower cases of Mu (m) & Sigma (s)] (mean & standard deviation of a population)

~Note: In mathematics, a parameter has a different meaning. (a third quantity that dictates the position of a particle, such as an angle or a time value) (i.e., parametric equations)

~Statistic: A descriptive measure based entirely on a sample of a population (usually Latin letters are used).

~Random Sampling: (also called a simple random sample). A procedure in which each possible sample of a given size (n) is equally likely to be selected. (equal probabilities of being selected).
(probabilities play an extremely important part in statistics) (later)

~Random Sample: A sample obtained by random sampling.

~Note: When sampling a population, we would like to eliminate all biases, so random sampling is our goal & is extremely important in order to draw valid conclusions about the population studied. (See link on BIASED SAMPLE under Topics of Interest)(random sampling is used though out this course)

~Note: Also, you must make sure that your sample is representative of the population considered. (to draw a conclusion about the average price of a home in California, don't restrict your sample to Beverly Hills).

~Example: There are 60 students in a given class, 10 students in each of 6 rows. You would like a random sample of 10 students from the class. You number the rows 1-6 then roll a die. What ever comes up (say 5), you choose those 10 from that row. Is this a random sample?
Is this a simple random sample?

~Other Sampling Methods: (4 others that you should know).
Some of which involve a selection of a random sample as a sub method.

~Systematic: Selecting every k th element of a population (i.e., phone book, on campus, at the mall, & so on) (i.e., every 10 th student entering the student center)

~Convenience: Very easy to get (lazy man's way). No set pattern. (i.e., simple phone call, asking a question to anyone, & so on).

~Stratified: Separating or subdividing a population into subgroups that share a common trait. (i.e., age, gender, smokers, income levels, & so on).

~Cluster: Separating the population into sections, then selecting some sections, then taking elements from that section. (i.e., towns in Dutchess County, counties in NY, voting precincts, & so on).

~Sampling Error: A sample from a population provides us with only a small portion of the entire population data. So, we don't expect the sample to yield perfectly accurate information about the population. Thus, we should anticipate a certain amount of error. Many things can go wrong (from the way we collect the sample to errors in computing the statistics we seek & the inferences we make).

~Note: You should be aware that using different statistics with different definitions can yield a conclusion quite different from what people think. (i.e., driving is safer than flying).