Not signed in

A percentile is a statistic that reports relative standing. The kth percentile is a number in the data set that splits the data into two parts: the lower part contains k percent of the data, and the upper part contains the rest of the data.[1,p.88] For example, if an exam score was reported to be at the 90th percentile, that means that 90% of the other students scored lower and 10% scored higher on the same exam.

There is no single definitive formula for calculating percentiles. The results using various methods may differ, but not by much.[1,p.89] The kth percentile (where k is any number between 1 and 100) can be calculated by hand following these steps:[1,p.88]

  1. Numbers in the data set are ordered in ascending order.
  2. Percent k is multiplied with n, the total count of numbers.
  3. A. If the result from step 2. is a whole number, numbers in the data set are counted from left to right until the one indicated by step 2. The kth percentile is the average of the corresponding value in the data set and the value that directly follows it.
  1. B. If the result from step 2. is not a whole number, it is rounded to the nearest whole number. Then numbers in the data set are counted from left to right until the one indicated by step 2. The kth percentile is the corresponding value in the data set.

Percentiles are used in a variety of ways for comparison purposes and to determine relative standing.[1,p.57] Percentiles have universal interpretation: being at the 95th percentile means the same no matter if looking at exam scores or weights of packages sent through postal service; the 95th percentile always means 95% of the other values lie below and 5% lie above it. This allows to fairly compare two data sets that have different means and standard deviations.[1,p.89] Percentile is not a percent, it is a number that marks a certain percentage of the way through the data. Suppose that a student's score on an exam was reported to be in the 80th percentile. This does not mean that they scored 80% of the questions correctly. It means that 80% of the other students' scores were lower than theirs and 20% students' were higher than theirs.[1,p.89]

Five-number summary

While the Empirical Rule uses the mean and standard deviation to describe a bell-shaped data set, in the case where data is not bell-shaped a different set of statistics based on percentiles is used to describe the big picture of data. The five-number summary is a set of five descriptive statistics that divide the data set into four sections with equal amount of data in each section. These cutoff points are represented by a set of five statistics that describe how the data is laid out.[1,p.93]

The five numbers in a five-number summary are:[1,p.93]

  1. The minimums (smallest) number in the data set.
  2. The 25th percentile (also known as the first quartile or Q1).
  3. The 50th percentile (the median, also known as the second quartile or Q2).
  4. The 75th percentile (also known as the third quartile or Q3).
  5. The maximum (largest) number in the data set.

The purpose of the five-number summary is to give descriptive statistics for center, variation and relative standing all at the same time.[1,p.94] The largest value in the data set minus the smallest value in the data set is called the range. It is easy to find, but it is almost meaningless. It depends on only two numbers, both of which may be outliers. The distance between 25th and 75th percentile is called interquartile range. It reflects the innermost 50% of the data in the five-number summary. Interquartile range is evaluated in the context of full range of the data set. It is similar to range with an important difference - it eliminates outlier and skewness issues.[1,p.84] When the data is skewed, interquartile range is a more appropriate measure of variability than standard deviation.[1,p.115] If the interquartile range is small, a lot of data is close to the median. If the interquartile range is large, the data is more spread out from the median.[1,p.94] The larger the interquartile range the more variable the data set is.[1,p.123] Five number summary is commonly represented by a boxplot graph.