Not signed in

A boxplot (also called a box-and-whisker plot[6]) is a one-dimensional graph based on the five-number summary, which includes the minimum value, the 25th percentile (known as Q1), the median, the 75th percentile (known as Q3), and the maximum value. These five descriptive statistics divide the data set into four parts; each part contains 25% of the data. Boxplot can be vertical with the values on the axis ordered from bottom (lowest) to top (highest); or horizontal, with values on the axis going from left (lowest) to right (highest).[1,pp.120-121]

The following are steps to forming a box plot:[1,p.120][5]

  1. Calculate the five-number summary of the data set.
  2. Create a vertical or horizontal number line with scale that includes the numbers in the five-number summary and uses appropriate untits of equal distance from each other.
  3. Plot a symbol or draw a line at the median.
  4. Draw a box between the Q1 and Q3.
  5. Determine whether or not outliers are present. To make this determination, calculate interquartile range and multiply it by 1.5. Add this amount to the value of Q3 and subtract this value from the value of Q1. Any data points that fall outside of this boundary are outliers.
    • If there are no outliers draw a line from the minimum value in the data set to the edge of the box at Q1, and draw another line from the maximum value in the data set to the edge of the box at Q3.
    • If there are outliers draw a line from the edge of the box at Q1 to the smallest value that is not an outlier, and draw another line from the edge of the box at Q3 to the greatest value that is not an outlier. Mark outliers with a separate symbol.

Similar to a histogram, a boxplot gives information regarding the shape, center and variability of the data set. Boxplots differ from histograms in terms of their strengths and weaknesses, most notably how outliers are handled. A boxplot can show whether a data set is symmetric (roughly the same on each side when cut down the middle) or skewed (lopsided). Skewed data has a boxplot where the median cuts the box into two unequal pieces. If the longer part of the box is to the righ (or above) the median, the data is said to be skewed right. If the longer part is on the left (or below) the median, the data is skewed left. A symmetric data set shows the median roughly in the middle of the box. Although the boxplot displays whether a data set is symmetric, it doesn't display the shape of the symmetry the way a histogram can.[1,p.122]

It is a common mistake to associate the size of the box in a boxplot with the amount of data in the data set. If one side of a boxplot is longer than the other (the data is skewed), it does not mean that side contains more data. It is not possible to tell the sample size by looking at a boxplot; it is based on percentages, not counts. Each section of the boxplot always contains 25% of the data. If one section is wider than another, it indicates a and wider range in the values of data in that section, i.e. the data is more spread out. A smaller section of the boxplot indicates that the data is more condensed together. The boxplot just marks off the places in the data set that separate those sections.[1,p.122,126]

Another common error involves sample size. A box-plot is a one-dimensional graph with only one axis representing the variable being measured. There is no second axis to indicate how many data points are in each group. In two boxplots, one with very long box and the other with a very short one, it does not mean that the longer one has more data in it. The length of the box represents the variability of the data, not the number of data values. The sample size should be indicated as part of the title of the boxplot.[1,p.126]