Boxplot
A boxplot (also called a box-and-whisker
The following are steps to forming a box
- Calculate the five-number summary of the data set.
- Create a vertical or horizontal number line with scale that includes the numbers in the five-number summary and uses appropriate untits of equal distance from each other.
- Plot a symbol or draw a line at the median.
- Draw a box between the Q1 and Q3.
- Determine whether or not outliers are present. To make this determination,
calculate interquartile range and multiply it by 1.5.
Add this amount to the value of Q3 and subtract this value from the value of Q1. Any
data points that fall
outside of this boundary are outliers.
- If there are no outliers draw a line from the minimum value in the data set to the edge of the box at Q1, and draw another line from the maximum value in the data set to the edge of the box at Q3.
- If there are outliers draw a line from the edge of the box at Q1 to the smallest value that is not an outlier, and draw another line from the edge of the box at Q3 to the greatest value that is not an outlier. Mark outliers with a separate symbol.
Similar to a histogram, a boxplot gives information regarding
the shape, center and variability of the data set.
Boxplots differ from histograms in terms of their strengths and weaknesses, most notably how outliers are
handled. A boxplot can show whether a data set is symmetric (roughly the same on
each side when cut down the
middle) or skewed (lopsided). Skewed data
has a boxplot where the median cuts the box into two unequal pieces. If the longer part of the box is to the
righ (or above) the median, the data is said to be skewed right. If the
longer part is on the left (or
below) the median, the data is skewed left. A symmetric data set shows the median roughly in the middle of
the box. Although the boxplot displays whether a data set is symmetric, it doesn't display the shape of the
symmetry the way a histogram
It is a common mistake to associate the size of the box in a boxplot with the amount of data in the
data set. If
one side of a boxplot is longer than the
other (the data is skewed), it does not mean that side contains more
data. It is not possible to tell the sample size by looking at a boxplot; it is based
on percentages, not
counts.
Each section of the boxplot always contains 25% of the data. If one section is wider than another, it indicates
a
and wider range in the values of data in that
section, i.e. the data is more spread out. A smaller section of the
boxplot indicates that the data is more condensed together. The boxplot just marks off the places in the data
set that separate those
Another common error involves sample size. A box-plot is a one-dimensional
graph with only one axis representing
the variable being measured. There is no second axis to indicate how many data points are in each group. In
two boxplots, one with very long box and the other with a very short one, it does not mean that the longer one
has more data in it. The length of the box represents the variability of the data, not the number of data
values. The sample size should be indicated as part of the title of the