~Class Meeting #2 - Describing data
~Note: The following are guidelines for grouping data
1) There should be between 5 & 20 classes(levels of data)(intervals or points)
2) Each data point belongs to one & only one class
3) All classes (intervals) should have the same width
~Note: Single-valued grouping has one numerical value for each class (discrete data).
~Example: The following are hypothetical results from one of my calculus exams:
52,62,65,65,68,68,70,73,75,75,78,82,82,87,88,88,88,89,89,92,96,96,98,98,99.
Scores Frequency
Relative frequency
Cumulative frequency
50-59
1
1/25
= .04 = 4%
1
60-69
5
5/25
= .20 = 20%
6
70-79
5
5/25
= .20 = 20%
11
80-89
8
8/25
= .32 = 32%
19
90-99
6
6/25
= .24 = 24%
25
Lower class limits are 50,60,70,80,90
Upper class limits are 59,69,79,89,99
Class midpoint or class mark is the average of the classlimits:54.5,64.5,74.5,84.5,94.5
Class width is the difference between the lower class limits: 10 in each case
~Frequency Histogram: A bar graph where the bars (rectangles)
are drawn adjacent (no gaps) to each other. The horizontal axis
displays the classes & the heights of the rectangles displays the
frequency. The center of the rectangle is placed over the midpoint of
each class.
~Relative Frequency Histogram: Same as above with the vertical
axis displaying the relative frequencies. There is an added feature
that the sum of the areas of all the rectangles add to 1, if data set
consists of single values (rectangle widths are 1).
~Cumulative Frequency Histogram: Same as above with the vertical axis displaying the cumulative frequencies.
~Frequency Polygon: Points are placed at the top of each
rectangle (over its midpoint) then these points are connected by line
segments. Segments are extended to the extreme left & right so that
they originate & terminate on the horizontal axis.
~Ogive: Points are placed over the upper class limit of each
class with the vertical axis displaying the cumulative frequencies,
then connected by line segments. We start the ogive from a point over
the smallest value & end with a point over the largest
value.(useful for determining the number of values below some
particular value. (a good way to visualize percentiles, if
vertical axis are relative cumulative frequencies)
~Pareto Graphs: (used for qualitative or discrete data).
Adjacent bar graphs in decreasing order of their frequencies. (vertical
axis are frequencies or relative frequencies)
~Note: For single-valued data, we use bar graphs & place the center of the bar over the value.
~Note: Bar graphs are very commonly displayed side-ways.
~Pie Charts: (I think they should be called Pizza Charts).
Expressing relative frequencies (%) as a slice of pie. The key is to
get a reasonable estimate of the central angle for that slice (class).
~Dot Plots: (used for small amounts of data). One horizontal
axis is drawn indicating the data and a point or dot is placed above
each data point.
~Stem and Leaf Plots: (more informative than a
histogram since the actual data points are used and are visualized).
The data is separated into two columns. The right column are the leaves
(consisting of the ones digits) and the left column are the stems
(consisting of the ten’s or higher digits).
~Scatter diagrams: Plotting paired data (x,y) collected
from two different data sets, one for x and one for y. Then these are
plotted as points in the xy-plane. Looking at the way the points are
scattered, one can determined if there is a relationship present. The
relationship could be linear or non-linear. The linear relationship is
directly related to correlation (will study later in the course).
Equations are found & predictions are made (regression line).
~Time-Series Graph: Data that has been collected at increasing
points in time & plotted as Quantity vs. Time on the xy-plane. Many
trends can be visualized this way.
~Examples of each (will do these during class time)
Pareto Chart: Train derailments: 23 by bad track, 9 by faulty
equipment, 12 by human error, 6 had other causes. The following is the
Pareto Chart. (will do in class)