Measures of Center In Statistics
These are my thoughts about measures of center in statistics.
Measures of center are widely used to provide representative values that summarize data sets.
A measure of center is a value at the center or middle of a data set.
The mean is generally the most important of all numerical measurements used to describe data. It is what most people call an average.
The mean of a set of data is the measure of center found by adding all of the data values and dividing the total by the number of data values.
Sample means drawn from the same population tend to vary less than other measures of center. The mean of a data set uses every data value. A disadvantage of the mean is that just one extreme value can change the value of the mean substantially. This extreme value is called an outlier. By this definition, we say the mean is not resistant.
A statistic is resistant if the presence of extreme values does not cause it to change very much.
The definition of the mean can be expressed by the formula:
\[\frac{\sigma x}{n} \]
Sigma refers to the sum of values. X is the sum of all values. N is the number of values.
If the data are from a sample of the population, the mean is denoted by x-bar.
If the data are from the entire population, the mean is denoted by mu.
Sample statistics are usually represented by English letters and population parameters are usually represented by Greek letters.
\(\sigma\) denotes the sum of a set of data values.
\(x\) is the variable usually used to represent the individual data values.
\(n\) represents the number of data values in a sample.
\(N\) represents the number of data values in a population.
Never use the term average when referring to a measure of center. The word average is often used for the mean but it should not be.
The median can be thought of as a middle value. More precisely, the median of a data set is the measure of center that is the middle value when the original data values are arranged in order of increasing or decreasing magnitude.
The median does not change by large amounts when we include just a few extreme values, so the median is a resistant measure of center. The median does not directly use every data value.
The median of a sample is sometimes denoted by x-tilde or m or Med. to find the median, first sort the values.
If the number of data values is odd, the median is the number located in the exact middle of the sorted list.
If the number of data values is even, the median is found by computing the mean of the two middle numbers in the sorted list.
Mode isn’t used much with quantitative data, but it is the only measure of center that can be used with qualitative data. The mode of a data set is the value that occurs with the greatest frequency.
The mode can be found with qualitative data. A data set can have no mode or one mode or multiple modes. When two data values occur with the same greatest frequency, each one is a mode and the data set is set to be bimodal. When more than two data values occur with the same greatest frequency, each is a mode and the data set is said to be multimodal. When no data value is repeated, we say there is no mode.
Midrange is another measure of center. The midrange of a data set is the measure of center that is the value midway between the max and min values in the original data set. It is found by adding the max data value to the min data value and then dividing the sum by 2.
Because the midrange uses only the max and min values, it is very sensitive to those extremes so the midrange is not resistant. In practice, the midrange is rarely used, but it has 3 redeeming features:
- It is very easy to compute
- It helps reinforce the very important point that there are several different ways to define the center of a data set.
- The value of the midrange is sometimes used incorrectly for the median, so confusion can be reduced by clearly defining the midrange along with the median.
When calculating measures of center, we often need to round the result.
For the mean, median, and midrange, carry one more decimal than is present in the original set of values.
For the mode, leave the value as is without rounding.
When applying any rounding rules, round only the final result, not anything before that.
We can always calculate measures of center from a sample of numbers, but we should always think about whether it makes sense to do that.
For example, it makes no sense to do numerical calculations with data at the nominal level of measurement. We should also think about the sampling method used to collect data. If the sampling method is not sound, the statistics we obtain may be very misleading.