Graphing Univariate Data
These are my notes on graphing univariate data.
Graphical summary measures are a good way of conveying information, but they are also subject to misinterpretation and can be distorted very easily. Two researchers can take the same data and convey completely different messages just by manipulating the layout of a graph.
Boxplots
A boxplot is a graphical data summary based on measures of position. It is useful for identifying outliers and the general shape of the distribution.
- Any points below the lower whisker are identified as outliers on the lower end
- Any points above the higher whisker are identified as outliers on the higher end
- The length of the box indicates the IQR or the middle 50% of the data, when the data is arranged in increasing order of value
- The length of the lower whisker shows the spread of the smallest 25% of data, when the data is arranged in increasing order of value
- The length of the first compartment of the box shows the spread of the next smallest 25% of data
- The length of the second compartment of the box shows the spread of the third smallest 25% of data
- The length of the upper whisker shows the spread of the largest 25% of the data
- Compare the lengths of the four parts to compare the respective spread of the data. Use the information about the spread to determine the shape of the distribution
Comparing Distributions
When comparing distributions of two or more groups, use the following criteria.
- Compare the centers of the distributions
- Compare the spreads of the distributions. Consider the differences in the spread of data within each group as well as the differences between groups
- Compare clusters of measures and gaps in measurements
- Compare outliers and any other unusual features
- Compare the shapes of the distributions
- Compare in the context of the question
Exploring Bivariate Data
Bivariate data is data on two different variables collected from each item in a study. We often want to investigate the relationship between two quantitative variables. If two quantitative variables have a linear relation, then we can measure the strength of that relationship with linear regression, a popular and relatively simple method.
There are two commonly used measures to summarize the relation between two variables. These are scatterplots and the correlation coefficient. A scatterplot is used to describe the nature, degree, and direction of the relation between two variables x and y, where they give a pair of measurements.
- Draw an x-axis and an y-axis
- Scale the axes to accommodate the ranges of data for the first and second variable
- For each pair of measurements, mark the point on the graph where the unmarked lines of the x and y values cross
A scatterplot can tell us a few things concerning the two variables including the shape, direction, and strength of relationship. A scatterplot tells us whether the nature of the relationship between the two variables is linear or nonlinear. A linear relation is one that can be described well using a straight line. The scatterplot will show whether the y-value increases or decreases as the x increases, or that it changes direction.
- If a scatterplot shows an increasing or upward trend, then it indicates a positive relationship between the two variables.
- If a scatterplot shows a decreasing or downward trend, then it indicates a negative relationship between the two variables.
If the trend of the data can be described with a line or a curve, then the spread of the data values around the line or curve describe the degree or strength of the relationship between the two variables.
- If the data points are close to the line, then it indicates a strong relationship between the two variables
- If a scatterplot has points that are more loosely scattered, then it indicates a weaker relationship between the two variables. If a scatterplot shows points scattered without any apparent pattern, then it indicates no relationship between the two variables.