Relationships Between Categorical Variables in Statistics

These are my notes on relationships between categorical variables in statistics.

When we want to see how two categorical variables are related, put the counts in a two-way table called a contingency table. Look at the marginal distribution of each variable. Also look at the conditional distribution of a variable within each category of the other variable. Comparing conditional distributions of one variable across categories of another tells us about the association between variables. If the conditional distributions of one variable are roughly the same for every category of the other, the variables are independent. Consider a third variable whenever it is appropriate, and be able to describe the relationships among the three variables.

 

Contingency Table

A contingency table displays counts and sometimes percentages of individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once to reveal possible patterns in one variable that may be contingent on the category of the other.

 

Marginal Distribution

In a contingency table, the distribution of either variable alone is called the marginal distribution. The counts or percentages are the totals found in the margins of the table.

 

Table Percents

When a cell of a contingency table holds percents, these can be percents of the total in the row or column of that cell. These are row, column, and table percents.

 

Conditional Distribution

The distribution of a variable when the Who is restricted to consider only a smaller group of individuals is called a conditional distribution.

 

Independence
Variables are said to be independent if the conditional distribution of one variable is the same for each category of the other.   

 

Segmented Bar Chart

A segmented bar chart displays the conditional distribution of a categorical variable within each category of another variable.

 

Mosaic Plot

A mosaic plot is a graphical representation of a contingency table. The plot is divided into rectangles so that the area of each rectangle is proportional to the number of cases in the corresponding cell.

 

Simpson’s Paradox

When averages are taken across different groups, they can appear to contradict the overall averages. 

 

Lurking Variables

A lurking variable is one that is not immediately evident in an analysis, but changes the apparent relationships among the variables being studied.

 

Contingency Tables in Excel

Excel calls contingency tables Pivot Tables. To make a pivot table, from the Data menu, choose pivot table. In the layout window, drag your variables to the row area, the column area, and drag your variable again to the data area. This tells Excel to count the occurrences of each category. 

 

Contingency Tables in R

Using the function xtabs, you can create a contingency table from two variables x and y in a data frame called mydata with the command:

con.table=xtabs(~x+y,data=mydata)