Histogram
A histogram is a graph that organises
and displays numerical
data
in picture form. The bars
connect to each
other in a histogram, unlike the bar graph. The height of each bar of a histogram
represents either the number
of data points
(frequency) or the percentage of data points (relative frequency) in each group. Each data point
from a data set falls into one and only one bar of the histogram. It is possible to make a histogram from any
numerical data set; however, it is not possible to determine the actual values of the data set from a
There are no fixed rules for how to create a histogram; the person making the graph can choose the
grouping on the
x-axis as well as the scale and starting and ending points on the y-axis. However, not every choice is
appropriate; in
fact,
a histogram can be made to be misleading in several
- If the interval of grouping of the numerical variable is really small, there will be too many bars in the histogram; the data may be hard to interpret because the heights of the bars look more variable than they should be. On the other hand, if the ranges are really large, there are too few bars and something interesting may be missed in the data.
- The y-axis of a histogram shows how many individuals are in each group, using counts or percents. A histogram can be misleading if it has a descriptive scale and/or inappropriate starting and ending points on the y-axis. If it goes by large increments and has an ending point that is much higher than needed, there will be a lot of white space above the histogram. The height of the bars will be squeezed down, making their differences look more uniform than they should. If the scale goes by small increments and ends at the smallest value possible, the bars become stretched vertically, exaggerating and suggesting bigger differences than really exist.
- If the vertical axis reports relative frequency, sample size must be supplied along with the graph.
Tips for setting up a histogram well
- Each data set requires different ranges for grouping, but ranges that are too wide or too narrow should be
avoided:
- A histogram that has too wide ranges for its groups places all the data into a very small number of bars that make meaningful comparisons impossible.
- A histogram that has too narrow ranges for its groups looks like a big series of tiny bars with no clear pattern.
- Groups should have equal width. If one bar is wider than the others, it may contain more data than it should.
- Borderline data points should all be consistently put either into their respective lower bar or their respective upper bar.
- Both x and y axis should have good descriptive labels to help with interpreting the histogram.
- Since it is not possible to calculate measures of center and variability from the histogram without knowing
the
exact values, basic statistics of center and variation should be calculated and presented along with the
histogram.[1,p.115]
Skewness
Data sets can have many different possible shapes. Three shapes that are commonly discussed in
introductory
statistics courses are right skewed, left skewed and symmetric
Symmetric data has about the same shape on either side of the middle. If cut down the middle
the left-hand
and the right-hand side resemble mirror images of each other.
When data is symmetric, the
mean and the median are close
Data is said to be skewed to the right if most of the data is on the left side of the
histogram, but a few
larger values are on the right. When the data is skewed right, the mean is larger than the median. The
"tail" on the right side of the graph is longer than on the
Data is said to be skewed to the left if most of the data is on the right side of the
histogram, but a few
smaller values are on the left. When the data is skewed left, the mean is smaller than the median. The
"tail" on the left side of the graph is longer than on the