Not signed in

Variation always exists in a data set, regardless of which characteristics are measured, because not every data point will have the same exact value for every variable. It is important to be able to measure variation in a way that best captures it.[1,p.80]

The standard deviation is a measurement for the amount of variability (or spread) among the numbers in a data set. It is the most common measure of variation for numerical data. In very rough terms, standard deviation is the average distance from the mean. The more concentrated data is, the smaller the standard deviation.[1,p.56] Standard deviation of a sample is commonly denoted by lowercase Latin letter "s".[1,p.81] The formula for standard deviation is:[1,p.56]

$$s = \sqrt {\sum {{{{{(x - \overline x )}^2}} \over {n - 1}}} } $$

Where,
n = the number of values in the data set
x = the numbers in the data set
= the mean of the data set

The standard deviation of the entire population is denoted by lowercase Greek letter "σ", pronounced as "sigma".[1,p.82].

Properties of a standard deviation are:[1,p.83]

  • The standard deviation is always a positive number, due to the way it is calculated and the fact that it measures a distance (distances are never negative numbers).
  • The smallest possible value for the standard deviation is zero, and that happens only in contrived situations where every single number in the data set is exactly the same.
  • The standard deviation calculated from the mean and thus it is affected by outliers. When outliers are removed from a data set, remaining values are more concentrated around the mean, and standard deviation decreases.[1,p.83]
  • The standard deviation has the same units as the original data.

A small standard deviation means that the values in the data set are close ot the mean of the data set, on average, and large standard deviation means that the values are spread out further away from the mean. A small standard deviation can be a goal in certain situations, for example, in product manufacturing and quality control.[1,p.82] Units are important when representing standard deviation. For example, a standard deviation of 2 years is equivalent to a standard deviation of 24 months, but the latter may appear greater than former. Value of the mean relative to standard deviation is important as well. If the mean is 5.2 and standard deviation is 3.4, that is a lot of variation, relatively speaking. But if the mean is 25.6, the same standard deviation of 3.4 would be comparatively smaller,[1,p.83] if the theoretical minimum value for both distributions is zero.

The standard deviation is also used to describe where most of the data should fall compared to the mean. For example, if the data has the normal distribution curve, about 95% of the data lies within two standard deviations of the mean. This is called the empirical rule or 68-95-99.7% rule.[1,p.57] The standard score represents a number of standard deviations above or below the mean.[1,pp.57-58]

Sample variance is another way to measure variation in a data set. Sample variance is equal to standard deviation squared, or "s2", it is an intermediate value to calculating standard deviation. The downside of sample variance is that it is in square units. If the data is in dollars, for example, the variance would be in square dollars, which makes no sense.[1,p.81]