Measure of Relative Standing in Statistics
These are my thoughts on the measure of relative standing in statistics.
Z Scores
Measures of relative standing are numbers showing the location of data values relative to the other values within the same data set.
A z score is found by converting a value to a standardized scale. This definition shows that a z score is the number of standard deviations that a data value is away from the mean.
The z score is calculated by using:
\[z = \frac{x - \Xbar}{s}\]
Or
\[z = \frac{x - \mu}{\sigma}\]
A z score is the number of standard deviations that a given value is above or below the mean.
Z scores are expressed as numbers with no units of measurement.
A data value is significantly low if its z score is less than or equal to -2 or the value is significantly high if its z score is greater than or equal to +2.
If an individual data value is less than the mean, its corresponding z score is a negative number.
A value is significantly low or significantly high if it is at least two standard deviations away from the mean. It follows that significantly low values have z scores less than or equal to -2 and significantly high values have z scores greater than or equal to +2. If a value is in between these values then it is not significant.
A z score is a measure of position, in the sense that it describes the location of a value relative to the mean. Percentiles and quartiles are other measures of position useful for comparing values within the same data set or between different data sets.
Percentiles
Percentiles are one type of quantiles or fractiles which partition data into groups with roughly the same number of values in each group.
The 50th percentile has about 50% of the data values below and above it.
The process of finding the percentile that corresponds to a particular data value is given by the following formula:
\[\text{percentile} = \frac{\text{number of values less than x}}{\text{total number of values}}*100\]
Notation
- N = total number of values in the data set
- K = percentile being used, for example k=25
- L = locator that gives the position of a value.
- \(P_k\) = kth percentile
Algorithm
Sort the data from lowest to highest.
Compute \(L=\frac{k}{100}*n\) where n= number of values and k= percentile in question.
Is L a whole number?
If yes, the value of the kth percentile is midway between the Lth value and the next value in the sorted set of data. Find P_k by adding the Lth value and the next value and dividing the total by 2.
If no, change L by rounding it up to the next larger whole number.
The value of P_kl is the Lth value, counting from the lowest.
Quartiles
Just as there are 99 percentiles that divide the data into 100 groups, there are three quartiles that divide the data into four groups.
Quartiles are measures of location, Q1,Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.
Interquartile range = \(Q_3 - Q_1\)
Semi-interquartile range = \(\frac{Q_3 - Q_1}{2}\)
Midquartile = \(\frac{Q_3 + Q_1}{2}\)
10-90 percentile range = \(P_90 = P_10\)
Boxplots
The values of the minimum, maximum, and three quartiles are used for the summary and construction of boxplot graphs.
For a set of data the summary consists of these 5 values:
- Minimum
- First quartile, Q1
- Second quartile, Q2
- Third quartile, Q3
- Maximum
A boxplot is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, the median, and the third quartile.
A boxplot can often be used to identify skewness. This means it is not symmetric.