June 21, 2024

Graphing Univariate Data

These are my notes on graphing univariate data.

Ankr Store on Amazon, keep your electronics charged by the best! f you buy something, I get a small commission and that makes it easier to keep on writing. Thank you in advance if you buy something.

Graphical summary measures are a good way of conveying information, but they are also subject to misinterpretation and can be distorted very easily. Two researchers can take the same data and convey completely different messages just by manipulating the layout of a graph.

Boxplots

A boxplot is a graphical data summary based on measures of position. It is useful for identifying outliers and the general shape of the distribution.

Any points below the lower whisker are identified as outliers on the lower end
Any points above the higher whisker are identified as outliers on the higher end
The length of the box indicates the IQR or the middle 50% of the data, when the data is arranged in increasing order of value
The length of the lower whisker shows the spread of the smallest 25% of data, when the data is arranged in increasing order of value
The length of the first compartment of the box shows the spread of the next smallest 25% of data
The length of the second compartment of the box shows the spread of the third smallest 25% of data
The length of the upper whisker shows the spread of the largest 25% of the data
Compare the lengths of the four parts to compare the respective spread of the data. Use the information about the spread to determine the shape of the distribution

Comparing Distributions

When comparing distributions of two or more groups, use the following criteria.

Compare the centers of the distributions
Compare the spreads of the distributions. Consider the differences in the spread of data within each group as well as the differences between groups
Compare clusters of measures and gaps in measurements
Compare outliers and any other unusual features
Compare the shapes of the distributions
Compare in the context of the question

Exploring Bivariate Data

Bivariate data is data on two different variables collected from each item in a study. We often want to investigate the relationship between two quantitative variables. If two quantitative variables have a linear relation, then we can measure the strength of that relationship with linear regression, a popular and relatively simple method.

There are two commonly used measures to summarize the relation between two variables. These are scatterplots and the correlation coefficient. A scatterplot is used to describe the nature, degree, and direction of the relation between two variables x and y, where they give a pair of measurements.

Draw an x-axis and an y-axis
Scale the axes to accommodate the ranges of data for the first and second variable
For each pair of measurements, mark the point on the graph where the unmarked lines of the x and y values cross

A scatterplot can tell us a few things concerning the two variables including the shape, direction, and strength of relationship. A scatterplot tells us whether the nature of the relationship between the two variables is linear or nonlinear. A linear relation is one that can be described well using a straight line. The scatterplot will show whether the y-value increases or decreases as the x increases, or that it changes direction.

If a scatterplot shows an increasing or upward trend, then it indicates a positive relationship between the two variables.
If a scatterplot shows a decreasing or downward trend, then it indicates a negative relationship between the two variables.

If the trend of the data can be described with a line or a curve, then the spread of the data values around the line or curve describe the degree or strength of the relationship between the two variables.

If the data points are close to the line, then it indicates a strong relationship between the two variables
If a scatterplot has points that are more loosely scattered, then it indicates a weaker relationship between the two variables. If a scatterplot shows points scattered without any apparent pattern, then it indicates no relationship between the two variables.

You should also read:

Numerical Methods For Continuous Data in Statistics

Univariate Data in Statistics

Displaying and Describing Data in Statistics