Organization of Data in Statistics
Statistics exist because of variations in data. It is our job to find meaning in this data. A frequency distribution is a good way to do this and one of the leys to organization of data in statistics.
A frequency distribution is a summary technique that organizes data into classes and provides in tabular form a list of the classes along with the number of observations in each class.
The process begins by refining information. An analyst will do this. He takes raw data and organizes that data. This is done by counting the number of observations in each classification.
A frequency distribution is a good way to handle large amounts of data. With it, we can see the overall structure of the data.
There are two steps in creating a frequency distribution:
- Choose the classifications
- Counting the number in each class
Graphs are important because they put information in visual form. While individual data can be lost, this is more than made up for by a nice graph. Use some type of graphing software to do this easily. Lots of different programs are available to create nice looking graphs these days.
Bar Charts
The bar chart is a simple graph in which the length of each bar corresponds to the number of observations in a category.
They are a good presentation tool and helpful in showing the differences in magnitude.
Creating a bar chart can get complicated. You should think about size, color, and labeling.
Pie Charts
Pie charts can represent the same information as a bar chart. The slices in a pie chart are proportional to the total in each category. You can easily compare the total of each category to the total overall.
When your data is qualitative, choosing categories is pretty easy. However, when your data is qualitative, choosing those categories is more complicated. The reason is that your choices often reflect how others will interpret the data. So, you have to be careful when doing this.
Choosing the number of categories is your choice and should depend on the amount of data available. You want enough categories to make the comparisons meaningful but not so many that it is hard to understand. Each situation will be different in this regard.
Relative Frequency Distribution
This represents the total observations in a category. It enables a person to view the number in each category in relation to the total number of observations. Another thing it does is change the frequency in each category to a proportion so we can compare data sets easier. I looks like this:
\[ \text{relative frequency} = \frac{\text{number in category}}{\text{total number}} \]
Cumulative Frequency Distribution
This gives a person the ability to quickly look at any category and see the number of observations and how they are related. The cumulative frequency is the sum of the frequency of a particular category and all preceding categories.
Cumulative Relative Frequency
The cumulative relative frequency is the proportion of observations in a particular category and all preceding categories.
Histograms
A histogram is used frequently and reveals the distribution of data. It is a bar graph of the frequency in which the height of each bar corresponds to the frequency of the category. Each category is represented by a vertical bar whose height is proportional to the frequency of the interval. The horizontal boundaries of each vertical bar correspond to the category endpoints. Once the frequency distribution has been calculated, all the information necessary for plotting a histogram is available.
Stem and Leaf Display
The stem and leaf display is a mix of methods. The display is similar to a histogram but the data remains usable to the user. It is useful for ordering and detecting patterns in the data. In other words, the raw data is not lost in the graph. It is similar to a histogram but the data remains visible.
Ordered Array
An ordered array is a listing of all the data in either increasing or decreasing magnitude. Data listed in increasing order is said to be listed in rank order. If listed in decreasing order, it is listed in reverse rank order. Listing data in an order is very useful and usually done. It allows you to scan the data quickly for the largest and smallest values.
Dot Plots
A dot plot is a graph where each data value is plotted as a point. If there are multiple entries, they are plotted above each other.
Time Series Data
A time series plot graphs data using time as the horizontal axis.